Research_Summary_2016 - Huber Group
Transcription
Research_Summary_2016 - Huber Group
Research Summary - Spring 2016 Wolfgang Huber Resarch Group Leader and Senior Scientist Contents A Research Vision 2 B Summary of Work 2 C Future Plans 10 D Publications 2012-16 15 E List of External Grants (since 2012) 22 F Curriculum Vitae 23 A Research Vision The unifying concept of my research is methodology: statistical expertise and the ability to invent new methods. I apply these where there is a gap whose overcoming will progress biology. I run an interdisciplinary group with three main aims: The first aim is to drive forward the state of the art of statistics in biology – that is, the science of reasoning with uncertainty, making reliable inference based on incomplete, noisy or overwhelming data. But I also understand statistics as an instrument for discovery: a set of tools that help humans see interesting patterns in large datasets. A second aim is to gain insight into pressing questions in drug-genotype interactions and precision oncology through proficient use of statistical computing. To achieve both of these aims, I closely collaborate with biomedical researchers who are equipped with exciting novel technologies and are producing novel data types. Thirdly, I aim to advance translational statistics by making methods usable not only for experts, but for a wide range of users. This aim is embodied by my engagement in the Bioconductor project. B Summary of Work Research Highlights of the Last Four Years From 2012 till present, 22 papers were published with W. Huber as corresponding author and/or group members as (co-)first authors [1–18, 49,50,56,66]. There were 67 papers altogether (Section D). Several are having an impact. Highlights are: Statistical methods. We developed the first approach to false discovery rates in multiple testing that permits data-driven hypothesis weighting [68]. The power gains can be large, and the method is broadly applicable. RNA-seq. We developed what have become standard tools for RNA-seq analysis, most prominently DESeq2 for differential gene expression analysis [8, 10]. Moreover, we published htseq, DEXSeq [18] and a method for single-cell RNA-seq [49]. We used DEXSeq for a novel contribution to the debate on ’junk’ versus ’function’ in alternate RNA isoforms [15]. We used statistical modelling to weigh the extent of stochasticity and regularity in the promiscuous gene expression of medullary thymic epithelial cells [4]. Other ’omics. We developed methods and Bioconductor packages for cancer genome sequencing [6, 12], 4C-seq [7] and iCLIP [25] and applied these in numerous collaborative projects. We contributed to the adaptation of the DESeq2 framework to other data types, such as Ribo-seq and ChIP-seq. Translational statistics. Many powerful mathematical and computational methods exist but are difficult to access for a majority of biomedical scientists. We translate advanced ideas into practical methods and software. I took responsibility for the European presence of Bioconductor [1], a widely used bioinformatics software project, through organising developer conferences, annual summer courses and obtaining EC network grants (RADIANT, SOUND) for the project. Gene-gene & gene-drug interactions. We discovered a method for automated inference of the direction of epistatic genetic interactions from high-content phenotyping data [2]. We partnered with the National Centre for Tumour Diseases (NCT) to translate high-dimensional phenotyping and gene-drug interaction screening into practical personalized medicine. Systems microscopy. We developed methods for estimating quantitative biophysical models from timeresolved microscopy data and applied these in several successful collaborations with developmental biologists [5, 24, 48, 56]. Thermal proteome profiling. We recently started work on statistical methodology and computational infrastructure for thermal proteome profiling [9, 27]. Our aim is to make the technology as widely accessible and usable as possible – a new ’workhorse’ for scientists both in fundamental and pharmaceutical research. Reproducible research. All our major papers are accompanied by a complete transcript of all computations from raw data to figures, tables and numbers reported in the paper. Further details on a selection of the above-mentioned highlights are given in the following. B.1 Translational Statistics The adjective translational is sometimes used for efforts to translate biological discoveries into something useful for medicine. I use the term translational statistics for efforts to make sophisticated mathematical discoveries and computational methods accessible to a wide range of natural scientists. I have contributed to the Bioconductor project since 2002 [1, 8, 53, 112, 139, 140]. The project has been providing an energetic, fast-moving platform to the research community for collaborative, interoperable, scientifically leading software in genomics and quantitative biology. It has also become a platform for the publication of bioinformatic software that many authors aspire to. Bioconductor is the largest software project in bioinformatics, with several thousand users and hundreds of developers worldwide. It comprises more than 1000 software packages. I have outlined the aims of the project, and the means by which we achieve them, in a recent perspective paper [1]. My particular role in the project has been in the provision of mathematically sophisticated packages for the primary data analysis of popular technologies [6–10, 12, 18, 50, 68, 71, 82, 88, 104, 105, 109, 124, 154]. For some of them it might be fair to say that they were among the “killer applications” that helped bring new users to Bioconductor. A list of those software packages is provided in Section F, heading Software. An important goal has been to facilitate interoperability of R/Bioconductor with other software projects. For instance, the rhdf5 package provides an interface to the HDF5 data storage system. HDF5 is used in high-performance computing and permits efficient exchange of large, array-shaped datasets between different software systems. The RBioFormats package provides an interface to BioFormats1 , the leading solution for reading vendor-specific microscopy image data and metadata formats. The package lpsymphony interfaces to the powerful SYMPHONY optimisation package, an open-source solver for mixed-integer linear programmes. Since 2005, an annual general Bioconductor conference has been held in the US each summer. Since 2010, I have coordinated annual European developer conferences, which take place in the winter and alternate between the UK and the continent (Heidelberg, Zurich). They usually attract 40-50 active and future package developers. For new users, I organise the annual CSAMA summer schools in Brixen, South Tyrol, which have taken place every year since 2004. These week-long compact courses host around 60 participants (places are usually booked out quickly) and are taught by high-calibre teachers, incl. R. Gentleman, M. Morgan, V. Carey, M. Love, S. Anders. Since 2015, I have been involved in the organisation of bi-annual Statistics in Genomics workshops at the ETH’s wonderful conference centre on Monte Verità, Ascona, Switzerland. To support Bioconductor development, I have co-written the EC network grant RADIANT (2012-2015) and am coordinating the SOUND project (2015-2018). These grants include leading contributors to Bioconductor in Europe. They do not only provide research funding, but also positions for staff to work on strategically important infrastructure- or support-oriented tasks. SOUND also includes a US partner, M. Morgan, the leader of the Bioconductor project. B.2 DESeq, DESeq2 and DEXSeq DESeq is a method and software package for the differential analysis of count data from high-throughput sequencing that we published in 2010 [82]. In the meanwhile, it has been cited over 2,400 times2 . With DESeq2 , we have greatly extended its statistical sophistication (Figure 1) and the range of its applications, and improved the software user interface and robustness, documentation and associated training material [10]. The method is based on generalized linear models and uses empirical Bayes methodology to permit model parameter estimation even in the case of few (e. g., two) replicates. Contrary to some misperceptions, using only such a ’small’ number of replicates is a reasonable, scientifically and economic efficient choice for designed experiments3 . It is supported by progress in statistical modelling 1 http://www.openmicroscopy.org/site/products/bio-formats ISI Web of Science 3 It is helpful to distinguish between designed experiments, performed under well-controlled laboratory conditions, and studies, done with cohorts ’in the wild’, e. g., human subjects in the clinic. For the latter, large cohort sizes (100s, 1000s) 2 Figure 1: DESeq2 uses empirical Bayes methodology to obtain stable estimates of logarithmic fold changes (LFC) and variances even when the number of replicates is small. In this figure from reference [10], panels A and B show M A-plots of the maximum likelihood (ML, A) and maximum a posteriori (MAP, B) estimates of LFC. Two genes with similar mean count and MLE LFC are highlighted by green and purple circles, and their normalized count data are shown in panel C. The green gene has low dispersion, the purple gene, high dispersion. Panel D shows the densities of the likelihoods (solid lines), posteriors (dashed) and the Empirical Bayes prior (solid black). –in particular, empirical Bayes methodology and hierarchical models that share information between genes– over the last 15 years. Since its publication in December 2014, the DESeq2 paper [10] has been cited over 110 times4 , and the package was downloaded from >35,000 unique IP addresses over the last year. DESeq2 is an example for relatively sophisticated statistical methodology that makes a practical difference to biologists. Moreover, we also published htseq and htseq-count for counting the overlap of aligned sequencing reads with genomic features. This is a basic step in the processing of RNA-seq data. The paper associated with the software [11] has been cited over 300 times and is mentioned in >1,000 papers according to the full-text search of PubmedCentral5 . It is the most prominent implementation of the ’counting’ approach to RNA-seq6 . DEXSeq [18] addresses alternative isoforms. In comparison to approaches that try to reconstruct full transcripts before testing them for differential abundance across conditions, DEXSeq short-circuits the assembly and looks for differential exon usage directly. It has performed well in recent benchmarks7 compared to the aforementioned approaches – a result of the fact that the goal of full mammalian transcript reconstruction from Illumina HiSeq short reads remains elusive8 . Drift and conservation of differential exon usage across tissues in primate species. Using DEXSeq on multi-species, multi-tissue data, we have made a contribution to the discussion of ’junk’ versus ’function’ in alternate RNA isoforms [15]. We found that for a large fraction of tissue-specific isoform diversity seen in primates, the tissue-specific expression is not conserved even between closely related species. On the other hand, for the subset of highly expressed tissue-specific isoforms (3,800 exons in 1,643 genes), we do detect conserved tissue-specific usage across species. To the extent that such conservation is an indicator of selection for function, our analysis supports the view that, by and are needed, and analysis methods can be less reliant on the limma / edgeR / DESeq2 - style empirical Bayes approach to information sharing across genes. 4 More than 450 if references to the bioRχiv preprint are included. 5 http://www.ncbi.nlm.nih.gov/pmc/?term=htseq 6 More recently, methods that circumvent the alignment and feature-counting steps by directly assiging reads to target sequences via k-mer matching, such as sailfish, are gaining traction. Eventually, this approach is likely to make the use cases for htseq-count less numerous – but not those for differential expression analysis, i. e., DESeq2 or its related methods. 7 E. g., Soneson et al. (2015) http://dx.doi.org/10.1101/025387 8 Steijger et al. (2013) http://dx.doi.org/10.1038/nmeth.2714 large, alternative isoform usage is leaky and noisy at low abundance levels, but more tighly controlled and functional for higher abundance transcripts. For single cell RNA-seq data, Simon Anders published an influential method for distinguishing true biological variability from technical variability [49]. We used it to resolve a debate on the extent of stochasticity and regularity in the promiscuous gene expression programmes of medullary thymic epithelial cells [4]. Also with the Steinmetz lab, we mapped cell-to-cell variability of 3’ isoform choice by single-cell polyadenylation site mapping [32]. Extension to other data types. A special highlight here was that the very first high-throughput CRISPR/Cas9 screen9 was analysed with DESeq2 . The fact that this was done without our direct involvement speaks for the usability of the software. As for our own efforts, we focused on highthroughput chromosome conformation capture assays, specifically 4C, HiC and ChIA-PET. We developed the Bioconductor package FourCSeq [7], applied it to research reported in Nature [42] and presented further results on analysis of HiC data in [69]. Documentation and usability. We published an end-to-end RNA-seq data analysis protocol oriented to practitioners in Nature Protocols [50]. This was written as a consensus document together with the authors of the main competing package, edgeR . Two years later, we provided an updated and distinctly extended version in F1000Research [8]. B.3 Cancer Genomics We developed the h5vc package, which leverages the high-performance data storage system HDF5 together with R/Bioconductor for large-scale analyses of genome sequencing data [12]. We also published the SomaticSignatures package, which identifies mutational signatures of single nucleotide variants (SNVs) in tumour genomes [6]. It provides infrastructure related to the methodology described by Nik-Zainal (2012, Cell). We applied these tools in numerous collaborative projects, including the HeLa genome [17], the first data-based estimation of position-specific error rates for each base in the human genome10 . B.4 Multiple Testing, False Discovery Rates and Hypothesis Weighting When functional genomics data became available in the 1990s, a spike of interest arose in the topic of multiple testing. With the adoption of the false discovery rate (FDR) as a common experimentwide summary and with practical computational methods11 , it seemed for a while that the topic was settled. However, as the size and complexity of datasets have increased, researchers have realized a major limitation of the currently used FDR methods: the exchangeability assumption. The information used from each hypothesis test is only the p-values. Other potentially useful information –such as the power of the test, the observed effect size, the prior probability of the null hypothesis– is effectively ignored. Although various ad hoc fixes and heuristics existed, they were unsatisfactory since they were statistically inefficient, required manual ad hoc tuning, or were even fallacious. Our work provides a principled, data-driven and statistically near-optimal solution to the problem [68]. It generalizes earlier work [10, 81]. B.5 Gene-Gene and Gene-Drug Interactions Automated phenotyping from microscopy image analysis. Microscopy-based readouts are more informative for phenotyping than bulk viability or reporter assays, by providing single-cell resolved data on processes such as cell cycle and proliferation, cell migration, trafficking and organelle morphology. We have created an R-based infrastructure –in particular our Bioconductor package EBImage– to support such high-throughput workflows, and have applied it widely in successful collaborations [2,3,5,14, 24,41,44,48,52,54,56,65,67]. In comparison to other tools (e. g., CellProfiler, Matlab, ImageJ/Fiji), strengths of our solution lies in the combination of functionality, speed and scriptability. 9 Zhou et al. (2014) http://dx.doi.org/10.1038/nature13166 Julian Gehring’s PhD thesis; paper to be published 11 Most prominently, the method of Benjamini and Hochberg. 10 Published online: December 23, 2015 Molecular Systems Biology A Integrated phenotypic and pharmacogenetic compound profiling C14 YC-1 ARP 101 Cantharidic acid C15 Cantharidin BIO low high Disulfiram C18 ZPCK Tyrphostin AG 555 CAPE Betamethasone C4 Beclomethasone U0126 (control) C2 5'dFUrd C6 U0126 PD98059 5-FU similarity of multiparametric interaction profiles 1 C9 Carboplatin CB 1954 DMAT TBBz Genotypes B C12 C11 BAY 11-7082 BAY 11-7085 STATTIC C13 C10 Figure 2: Unsupervised clustering of drugs based on the correlation of their imaging-based high-content phenotypes in 12 different cell lines [3]. The correlation distances between each pair of compounds are shown in the upper left half of the matrix. For comparison, the lower right shows the structural similarities (Tanimoto distances). C1 Taxol (control) Taxol Podophyllotoxin Colchicine Vinblastine Vincristine Vinblastine (control) CHM-1 hydrate Nocodazole Multiparametric C17 Ouabain Dihydro-Ouabain Brefeldin A Bendamustine Iodoacetamide Pifithrin-mu Parthenolide Supercinnamaldehyde C3 C5 Rottlerin Niclosamide C16 C8 Mitoxantrone Camptothecin Thapsigargin Calcimycin CGP-74514A Emetine NSC95397 Phenanthroline 5-Azacytidine Aminopterin Methotrexate PD 169316 SB 202190 2 C7 Etoposide Amsacrine NU2058 Ara-C Cyclo-C Marco Breinig et al structural similarity of compounds low Genotypes and high C ot yp es tip phara ar m am Ge eno et n t et o y ric ric ty pe ph pes s en a ot nd yp es m ul tip M ul G en ECDF ∆ AUC multiparametric 0.3 Target phenotypes In 1.0 terms of functionality, we leverage R’sphenotypes rich toolset forselectivity statistics, machine learning and publicationquality data visualisation. 0.8 0.2 0.6 Another output of general interest is our new feature selection method [2]. It combines attractive 0.4 0.1 0.2 properties of linear rotation methods (such as principal component analysis, linear discriminant anal0 0 ysis), namely, non-redundancy and signal-to-noise based dimension selection with the advantages of 0 0.5 1.0 –1.0 –0.5 0 0.5 1.0 –1.0 –0.5 0 0.5 1.0 –1.0 –0.5 Correlation between compound profiles feature selection, namely, interpretability and portability. no shared target selectivity shared target selectivity We performed the first gene-gene interaction screen by combinatorial RNAi in human cells [14, 44]. We demonstrated the power of genetically engineered cell lines and high-content phenotyping for discovering drug-gene interactions (Figure 2 [3]). We invented a new method for deducing directionality in gene-gene interaction data (Figure 3). 10 The inferred directed arrows can often be related to temporal, logical, or causal hierarchy of the targeted gene products [2]. The method is applicable to multivariate phenotypes, and in particular to features from high-content screening. Besides gene-gene interactions, it will also be applicable to gene-drug or drug-drug interactions. We are currently pushing forward this line of work from laboratory cell lines to large cohorts of primary cancer cells, in an exciting collaboration with haematologists at the National Centre for Tumour Diseases (Figure 4). Figure 5. Molecular Systems Biology 11: 846 | 2015 B.6 ª 2015 The Authors Reproducible research We have established a system of supplementary information that we use for all our major papers. It allows readers to fully reproduce the reported results from raw data to all figures, tables and numbers. We provide these packages for the free, open-source R system, most of them hosted on Bioconductor12 . The packages contain the raw data files, custom-written procedures incl. standard R-style documentation in manual pages and literal programming documents. These are documents authored with the knitr system that mix computer code and human-readable narrative and are executable by anyone. In this way, readers can not only reproduce what we did, but also check the effect of variations of our analysis choices on the results. Moreover, they may take our methods and adapt them to their data. Besides the direct utility of this information, our aim is also to demonstrate across a range of journals and communities that it is possible to move beyond supplementary information in static PDF files to support a paper. These include: 12 https://bioconductor.org Topic Single cell transcriptome analysis in the early mouse embryo Life-cell microscopy study of cell migration in the fish embryo First comprehensive RNA interactome Map of genetic interactions in human cancer cells with RNAi and multiparametric phenotyping Large-scale directional genetic interaction map in fly Mapping of signalling networks through synthetic genetic interaction analysis by RNAi Chemicalgenetic interaction map of small molecules using highthroughput imaging in cancer cells Single Cell RNA-Seq Protein turnover in embryos based on tandem fluorescent timer microscopy RNA-Seq analysis end-to-end workflow RNA-Seq analysis method Dynamical modelling of cell cycle phenotypes from genome-wide RNAi live-cell imaging Drift and conservation of differential exon usage across tissues in primate species Differential exon usage from RNA-Seq method Furrow segmentation in life imaging of optogenetic experiment Mutliple testing methods paper Journal Nature Cell Biology [40] Package/URL Hiiragi2013 Nature [48] DonaPLLP2013 Cell [66] Nature Methods [14] Website HD2013SGI eLife [2] Nature Methods [71] DmelSGI RNAinteractMAPK Mol. Syst. Biol. [3] PGPC Nature Immunology [4] Development [5] Single.mTEC.Transcriptomes TimerQuant F1000 Research [8] Genome Biology [10] BMC Bioinformatics [16] Webpage DESeq2 , Webpage mitoODEdata PNAS [15] PDF vignette Genome Research [18] Developmental Cell [24] DEXSEq, pasilla furrowSeg bioRχiv [68] github Genes and chromosomes | Genomics and evolutionary biology Cdc23→sti Figure 4. Deriving directional genetic interactions. (A) Multiparametric phenotypes are extracted for single a scores are computed for each double knockdown experiment. The schematic plots in the third column show types were computed from images of cells treated with combinatorial libraries of single and double interactions between gene A and gene B using two exemplary phenotypes. The single knockdown phenoty RNAi knockdowns [2]. Each phenotype was represented as an n-dimensional vector; the origin of knockdown depicted as arrows. expected here double knockdown phenotype for the vector space wasdouble fixed such that thephenotypes null vector (AB) is theare negative control. For The visualisation, of the single gene effects, is depicted by the symbol NI. Black arrows depict the genetic interaction π. The fir n = 2: cell number and area of nuclei. In [2], we used n = 21. It turns out that in many cases are not interacting. Below, four types of interaction between the genes A and B are shown: gene A is alleviati the double knockdown phenotype vector of two genes A and B is approximately collinear with B; and in reverse, B alleviates or aggravates gene A. Whenever the genetic interaction (black arrows) is paral that of one the two genes, but is either increased or decreased. These four scenarios are depicted effects, a directional genetic interaction is called. (B–D) A directional interaction detected between Cdc23 and schematically on the left. The middle and right panels show data for two exemplary genes, sti and show the phenotypes (nuclei area and cell number) of the two dsRNAs designed for sti and Cdc23. The grey Cdc23, for four replicate experiments. The data are best fit by model B→A, indicating that loss of effect for the two genes. The black arrows, indicating the genetic interaction, are directed opposite to the function of Cdc23 reverts the phenotype of sti. Biologically, this is explained by the fact that the Figure 4. continued on next page cytokinesis regulator sti acts chronologically after the APC/C member Cdc23 in mitosis. In Fig. 5 Figure 3: Data-based inference of directional epistatic genetic interactions. Multivariate pheno- of the paper [2] we showed how to derive a dense network of such directional epistatic interactions for mitosis-relevant genes. Note: the images shown here represent only a small zoom-in view of the images analysed. Fischer et al. eLife 2015;4:e05464. DOI: 10.7554/eLife.05464 rametric phenotypes are extracted for single and double knockdowns. Genetic interaction The schematic plots in the third column show the model for identifying directional genetic y phenotypes. The single knockdown phenotypes of genes A and B and the measured he expected double knockdown phenotype for non-interacting (NI) genes, which is the sum arrows depict the genetic interaction π. The first row shows the case where genes A and B e genes A and B are shown: gene A is alleviating to gene B, gene A is aggravating to gene r the genetic interaction (black arrows) is parallel or anti- parallel to one of the single gene ional interaction detected between Cdc23 and sti. (C) The two orange and two blue arrows dsRNAs designed for sti and Cdc23. The grey dots show the expected double knockdown etic interaction, are directed opposite to the phenotype of sti, indicating that functional 9 of 21 Lars Steinmetz EMBL Michael Boutros DKFZ Martin Morgan Thorsten Zenz RPCI (Buffalo, USA) NCT, DKFZ Jan Korbel Mikhail Savitski EMBL EMBL Eileen Furlong Jeroen Krijgsveld EMBL EMBL Susan Holmes Jan Ellenberg Darren Gilmour Stanford EMBL EMBL Stefano de Renzis Takashi Hiiragi Michael Knop EMBL EMBL Heidelberg Andreas Trumpp Matthias Hentze DKFZ EMBL Alvis Brazma EBI Gitte Neubauer, Gerard Drewes Cellzome / GSK Judith Zaugg Peer Bork EMBL EMBL Transcriptomics, systems genetics [4,15,17, 21,32,45,49,51,58, 69, 72, 74, 90, 96, 97, 99, 103, 122, 124] Gene-gene and gene-drug interactions, high-content phenotyping [2, 3, 14, 44, 71, 78, 79, 83, 84, 93, 123] Bioconductor – software for genome-scale data analysis [1, 53, 102]. Funding: BIGDATA, SOUND Cancer pharmacogenomics [28, 31, 70]. Funding: SOUND, TRANSCAN GCH-CLL Cancer genomics [17, 80, 91]. Funding: BioTop, HD-HuB Thermal proteome profiling – statistical method development [ 9, 27] 4C data analysis [7, 42, 118] Mass spectrometry based quantitative proteomics [13, 26,43,59, 61, 64, 66] Statistical methods for high-throughput biology Systems microscopy [16, 86,87]. Funding: Systems Microscopy Quantitative modelling from live cell imaging of cell migration [48, 5] Optogenetic study of tissue morphogenesis [24] Single cell transcriptomics [40] Quantitative modelling of microscopy data for protein turnover [41, 67] RNA-seq data analysis [13, 20, 43] RNA interactome – statistical method development [25, 60, 61, 63, 66]. Funding: joint EIPOD Quantitative methods for RNA-seq; imaging bioinformatics [29, 46, 75, 95, 105, 106, 131]. Funding: Systems Microscopy Thermal proteome profiling, high-content phenotyping and multi-omics [9, 27]. Funding: GSK postdoc fellowship; joint EIPOD eQTL analysis – statistical method development [30, 68] Bioinformatics pipelines, statistical methods [76]. Funding: HD-HuB Table 1: Overview of collaborations. Resulting publications and joint research grants (see also Section E) are stated where available. C Future Plans Biostatistics for the 21st Century The ultimate goal of my research is the successful application of multi-omics and computational reasoning to personalised health and medicine. My distinctive mark will be the combination of statistical methods innovation and practical application to leading-edge experiments or studies. I will continue to search out collaborations with biotechnology developers and biomedical researchers. I also plan to invest in the immersion of physician-scientists into genomic big data analysis. In terms of data types, the leading themes will be: • New technologies in nucleotide sequencing, proteomics, imaging, real-time monitoring • Pervasive longitudinal multi-omic data • Single-cell resolution for ever more assays • High-throughput genetics and precision oncology In terms of methods: • Data heterogeneity, data missing not at random and other biases • Structured learning • Translational statistics C.1 New Technologies I aim to create innovative computational algorithms to mine the big and complex data that arise as part of developing new biotechnologies and applying them to novel areas of biology. Successful examples include microarrays [112,139,154], tiling arrays [122,124], collaborative statistical computing [1, 139], RNAi [14, 71, 123], RNA-seq [8, 10, 11, 18, 50, 82], 4C [7, 42], iCLIP [25], single-cell RNA-seq [4, 49], high-content phenotyping [2, 3, 54, 83, 84], iTRAQ [88], thermal proteome profiling [9, 27]. Current foci are: • Thermal proteome profiling and other applications of quantitative mass spectrometry – Data-driven biophysical modelling of melting curves – Rich multiparametric hierarchical models and (empirical) Bayes methods to make them identifiable from data • Single cell sequencing – Dimension reduction, detection and quantitative modelling of underlying structures: trajectories, gradients, bifurcation points, Waddington landscapes – Integrating multiple layers of data (e. g., DNA, transposase-accessible chromatin, RNA) • Imaging-based phenotyping of tumour models • High-throughput genetics I have always been keen to spot opportunities that might arise from early access to exciting new data types. Potential fields of future engagement are imaging (high-throughput super-resolution microscopy for spatially resolved single-cell ‘omics), microfluidics, high-throughput synthetic biology (e. g., CRISPR), third-generation sequencing. C.1.1 Pervasive Longitudinal Multi-Omic Data Humans are now the best-studied model organism. There are 7 billion individuals to be genotyped and phenotyped. There is a potential for extremely rich phenotypes, as the costs do not need to be born by research budgets. We can use data from clinics, which among other things are large phenotyping centres funded by health systems13 . Moreover, wearable devices and the Internet of Things are emerging. They will provide rich data on life-styles and physiological parameters also from healthy humans. ‘Omic datasets of the past were from single time points, were picked together from ad hoc cohorts, had small sample sizes and used a single technology (e. g., microarrays). In contrast, datasets of the 13 In 2013, 17.1% of the GDP of the USA was spent on health care, compared to 2.8% for research and development (incl. all sectors, not only health). Source: The World Bank, http://wdi.worldbank.org/table/2.15 and http://wdi.worldbank.org/ table/5.13 future will be pervasive (large cohorts, commoditized technologies), will be assayed at many time points during healthy life and disease, and use multiple ‘omic technologies to cover the range of relevant biology. Taken together, these developments will allow us to drive forward personalized medicine –the use of ‘omics and systems biology in evidence-based medicine– and personalized health – managing healthy life using new technologies (cf. the conference I co-organised, Section F). To help address associated challenges, I have assembled the international research network SOUND14 . SOUND is funded by the European Commission within its Horizon 2020 Research and Innovation programme “Personalising Health and Care” and runs from 9/2015 to 8/2018. The partners comprise bioinformatician-statisticians and physician-scientists from leading institutions in personalized medicine including NCT and EMBL Heidelberg, ETH and University Hospital Zurich, TU Munich, IDMEC Lisbon, BDD in The Hague and the Roswell Park Cancer Institute (USA). The objective of SOUND is to create the bioinformatic tools for statistically informed use of personal ’omic data in medicine, including cancers and rare metabolic diseases. Its partners have a strong track record and future commitment to Bioconductor (see Section C.3.3). Bioconductor has been exceedingly successful in enabling researchers to analyse the ‘omic datasets of the past, and the aim of SOUND is to help move forward Bioconductor to enable physician-scientists and biological researchers to effectively mine the pervasive longitudinal multi-omic data of the future. C.1.2 Single-Cell Resolution for Ever More Data Types Many molecular biology technologies were developed to work on bulk samples, i. e., on populations of millions of cells and billions of molecules. These numbers are coming down. In 2015, single-cell RNA sequencing for tens of thousands of cells (drop-seq) and the parallel sequencing of the same single cell’s RNA and DNA-methylation status were reported15 . Other assays (e. g., transposase-accessible chromatin, ATAC-seq) are sure to follow. New developments in chemical biology, fluorescent probes and super-resolution microscopy are beginning to enable the spatial localization and quantification of specific RNA (and DNA) sequences at single molecule resolution. For the statistician, these data offer exciting opportunities: Error modeling – the technologies will have imperfect sensitivities and specificities. False positives and false negatives will not occur randomly, but often depend on biophysical biases (e. g., sequence, internal state, environment) that need to be discovered, quantitatively modelled and estimated. Signal processing – there is a need for designing clever codes (e. g., molecular barcodes) and to later deconvolute them, possibly in complex combinatorial ways and in the presence of error; see, e. g., the work by Xiaowei Zhuang’s lab on spatially resolved multiplexed RNA profiling in single cells16 . Beyond averages – we will get variances and indeed full distributions, which need to be accurately and robustly estimated, and compared between each other (e. g., between cells with and without a stimulus). Patterns – what is noise, what is systematic behaviour? Variations that cancel out on average may or may not be actively regulated and systematic within single cells, and reveal important mechanisms. We addressed an instance of this question in [4]. Other examples are fluctuations in protein abundance that might be correlated by processes ensuring stoichiometry of operational units, or cellular localisation. C.2 Application Areas: High-Throughput Genetics and Precision Oncology This line of research is a continuation of our successful work on gene-gene and gene-drug interactions (Section B.5). I plan to conduct it with primarily two strong, cross-fertilizing collaborations, one with a technology and cell line model focus and one with a translational and clinical focus. 14 http://www.sound-biomed.eu Angermueller et al. (2016) http://dx.doi.org/10.1038/nmeth.3728 16 Chen et al. (2015) http://dx.doi.org/10.1126/science.aaa6090 15 10 0 10 0 10 0 0 0 BTK ibrutinib BTK ibrutinib MEK MEK nib eti se 80 80 ME K 20 40 60 80 MTOR 0 10 20 40 60 80 MTOR BTK 0 10 MTOR BTK 10 10 0 20 0 60 60 40 lum nib eti lum se 10 0 40 us im us im l ero l ero 20 ev ev us im 40 60 R R 60 O MT O MT rol ME K 80 40 80 20 100 20 100 80 SYK 20 40 60 80 BTK 10 SYK 20 40 60 80 BTK 10 SYK BTK ibrutinib BTK ibrutinib Figure 4: Pharmacogenomics of drug sensitivity. The position of each point in the ternary plots shows the relative response of a patient-derived primary chronic lymphocytic leukaemia (CLL) sample to each of three drugs (ibrutinib, everolimus, selumetinib) that specifically target three different signalling kinases (BTK, MEK, MTOR). The circle size represents the average response of the sample to all three drugs. The plot highlights pathway-specific dependency distributions. While the majority of CLL with unmutated IGHV locus (left panel) depend about equally strongly on BTK and MEK activity, the distribution in CLL with mutated IGHV locus (right panel) is more dispersed and shows a subgroup that respond to MTOR inhibition and less to the other inhibitors. C.3 C.3.1 Fundamental Problems in Statistics Data Heterogeneity, Data Missing Not at Random, and Biased Sampling The data heterogeneity challenge in multi-omics derives from the fact that for different ‘omic layers, different types of features are interrogated. DNA-related data are reported in chromosomal coordinate systems. The central dogma links that to RNA- and protein-related data, but the mapping can become arbitrarily complicated due to splicing, paralogy, post-transcriptional and post-translational modifications. Moreover, these processes may themselves be affected by the treatment of interest or differ between individuals. For metabolites and drugs, the link to the other coordinate systems is even less well defined. Moreover, even though in the simplest case all levels of multi-omic data are measured simultaneously on the exact same samples, in practice they may be taken at more or less different body sites or with more or less time between them. Altogether, this means that while ’old’ omic data can be conveniently modelled by a 2D matrix (features × samples), multi-omic data are more complex than adding a 3rd dimension to the matrix: the mappings between features and samples at different levels are fiddly, dynamic and uncertain. We work on concepts, algorithms and software to address such challenges. Sampling is at the basis of much of statistics: voter polls are made not by asking everyone who will vote, but from a sufficiently large and representative sample. Similarly, in RNA-seq or ChIP-seq we do not sequence every DNA molecule that is theoretically available. Complications start when the sampling is biased. If the bias is precisely known, one can try to adjust for it. But in most cases, detecting, modelling and quantifying the important biases is part of the analyst’s task. Furthermore, she can feed such observations back ’upstream’ to improve technologies and experimental designs. A related problem is data missing not at random. For instance, in single-cell RNA-seq, some genes may go undetected and unreported due to low abundance, but the probability of such drop-out events may depend on biochemical and biophysical factors in complex ways. All of these challenges require deep engagement with the data, good mechanistic understanding of the data generating biology and technologies, but also of the downstream inferential expectations and, not least, mastery of statistical tools inclunding visualisation and regression modelling. C.3.2 The Importance of Structure If we want to estimate any kind of statistical or biophysical model in the high-dimensional setting, we need to impose additional structure onto the data. For the past twenty years, sparsity has been a popular and powerful structural assumption. The lasso is a popular incarnation of this, but more abstractly, the whole multiple testing field has made the same assumption: “only a few genes are truly differentially expressed”. Imposing such structural assumptions manifests itself in making intractable problems tractable and providing interpretable statistical results. Nevertheless, blindly using a sparsity assumption can lead us astray, especially in heterogeneous settings. We need to apply our accumulated biological knowledge to infer structural patterns. For instance, known signalling or metabolic pathways can impose natural structures on genetic or metabolomic datasets. We will continue to develop regularization strategies based on prior biological knowledge. I am particularly excited about developing methods that can learn or update structural assumptions in a datadriven way (our recent work [68] is a step in this direction). With the plethora of datasets available, we can use these in an Empirical Bayes way. Such an approach would enable iterative rounds of algorithm and model improvement, and data mining for new discovery. C.3.3 Translational Statistics: Bioconductor This is one of the most difficult open problems in statistical research: how to rapidly produce robust software that solves a burning scientific question and share it with biomedical scientists. This question has been driving my research since I started in bioinformatics over 15 years ago, and my approach is embedded in the international Bioconductor collaboration [1]. Much of the infrastructure of the Bioconductor project (archive, build system, website) is managed by Martin Morgan at the Roswell Park Cancer Institute in Buffalo, NY. Its scientific content, however, is driven by groups in multiple locations. I have a track record in algorithm development and flagship biological applications and I aim to maintain this role. • I will maintain the software engineering work in my group. Our aim is to increase the usability of scientific software in terms of documentation, performance, robustness and interoperability. • I will continue to organise interdisciplinary training courses, such as the Brixen and EMBO courses (see Section F). • I will continue to organise the European Bioconductor Developer Workshops and help with similar events in other parts of the world. Scientific computing evolves rapidly. Although my work is strongly associated with R, this is no dogma. Our challenge will be to provide effective software platforms for computational biology in the medium term future, while safeguarding the investments that have been made (e.g. into R, CRAN and Bioconductor). Notably, over the last few years R has turned from an academic curiosity into a commercial-grade infrastructure17 . This is excellent news for bioinformatics since the field will benefit from enormous commercial investments that would be unimaginable with research funding. Nevertheless, I will also keenly monitor developments on other fronts, such as Julia and JavaScript18 . Particular fields of focus of future work will be data wrangling, cloud computing and visualisation. Data wrangling is the process of converting data from one (raw) form into another form that allows for consumption of the data by downstream tools for analysis and integration. Not seldom it takes the majority of time of an applied analysis project19 . There has recently been remarkable progress in this area, epitomized by the Hadleyverse20 . Our particular challenge will be to merge the useful concept of tidy data21 with concepts that have made Bioconductor successful, including self-contained and selfdocumenting data sets, encapsulation, abstraction and provision of sufficient metadata. 17 As evident e.g. from the formation of the R consortium, the acquisition of Revolution by Microsoft, the professional refinement of R by RStudio, or the fact that leading high-tech companies including Facebook, Google, SAP hire R programmers. 18 There is a friendly relationship between R and JavaScript as both derive from LISP / Scheme. 19 http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html 20 http://www.r-bloggers.com/welcome-to-the-hadleyverse 21 Tidy Data. Hadley Wickham, Journal of Statistical Software 59:10 (2014) Cloudification of resources is a general trend in the computing world that offers cost savings and increased efficiency. Naturally, it is also affecting bioinformatics. I see our role here not to invent, but to lead the field by showing how to adapt and specialise generic solutions from the software industry. A recent example is our provision of Docker containers for an RNA-seq workflow22 . Scientific visualisation has so far been remarkably conservative, presumably due to the overall conservativeness of the scientific publication process, which is still centred around “papers” (equivalently: self-contained, printable PDF files). Nevertheless, future generations may learn to make better use of interactive, computer-aided visualisations and modern web technologies, and I plan to leverage such new developments from the wider computing world for scientific data visualisation and exploration. 22 https://hub.docker.com/r/vladkim/rnaseq D Publications 2012-16 P indicates equal contributions, B co-corresponding authorships. See also http://www.huber.embl.de/ publications. Bibliometry is available, e. g., from Google Scholar. Corresponding author papers 2012–16 [1] Orchestrating high-throughput genomic analysis with Bioconductor. Wolfgang HuberB , Vincent J. Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S. Carvalho, Hector Corrada Bravo, Sean Davis, Laurent Gatto, Thomas Girke, Raphael Gottardo, Florian Hahne, Kasper D. Hansen, Rafael A. Irizarry, Michael Lawrence, Michael I. Love, James MacDonald, Valerie Obenchain, Andrzej K. Oleś, Hervé Pagès, Alejandro Reyes, Paul Shannon, Gordon K. Smyth, Dan Tenenbaum, Levi Waldron, and Martin Morgan. Nature Methods, 12:115–121, 2015. pdf, url (35 citations23 ). P P P P [2] A map of directional genetic interactions in a metazoan cell. Bernd Fischer , Thomas Sandmann , Thomas Horn , Maximilian Billmann , Varun Chaudhary, Wolfgang HuberB , and Michael BoutrosB . eLife, 4, 2015. pdf, url. P P [3] A chemical-genetic interaction map of small molecules using high-throughput imaging in cancer cells. Marco Breinig , Felix A. Klein , Wolfgang HuberB , and Michael BoutrosB . Molecular Systems Biology, 11(12), 2015. pdf, url. P P P [4] Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in medullary thymic epithelial cells. Philip Brennecke , Alejandro Reyes , Sheena Pinto , Kristin Rattay , Michelle Nguyen, Rita Küchler, Wolfgang HuberB , Bruno KyewskiB , and Lars M. SteinmetzB . Nature Immunology, 16:933–941, 2015. pdf, url. P [5] TimerQuant: A modelling approach to tandem fluorescent timer design and data interpretation for measuring protein turnover in embryos. Joseph D. Barry, Erika Donà, Darren Gilmour, and Wolfgang Huber. Development, 143(1):174–179, 2016. pdf, url. [6] SomaticSignatures: inferring mutational signatures from single-nucleotide variants. Julian S. Gehring, Bernd Fischer, Michael Lawrence, and Wolfgang Huber. Bioinformatics, 31(22):3673–3675, 2015. pdf, url. [7] FourCSeq: Analysis of 4C sequencing data. Felix A. Klein, Tibor Pakozdi, Simon Anders, Yad Ghavi-Helm, Eileen E. M. Furlong, and Wolfgang Huber. Bioinformatics, 31(19):3085– 3091, 2015. pdf, url. [8] RNA-Seq workflow: gene-level exploratory analysis and differential expression. Michael I. Love, Simon Anders, Vladislav Kim, and Wolfgang Huber. F1000Research, 4(1070), 2015. pdf, url. P P [9] Thermal proteome profiling for unbiased identification of direct and indirect drug targets using multiplexed quantitative mass spectrometry. Holger Franken , Toby Mathieson , Dorothee Childs , Gavain M.A. Sweetman , Thilo Werner, Ina Tögel, Carola Doce, Stephan Gade, Marcus Bantscheff, Gerard Drewes, Friedrich B.M ReinhardB , Wolfgang HuberB , and Mikhail M. SavitskiB . Nature Protocols, 10(10):1567–1593, 2015. pdf, url. P P [10] Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2. Michael I. Love, Wolfgang Huber, and Simon Anders. Genome Biology, 15(12):550, 2014. pdf, url (112 citations). 23 Source: ISI Web of Science. [11] HTSeq – a Python framework to work with high-throughput sequencing data. Simon Anders, Paul Theodor Pyl, and Wolfgang Huber. Bioinformatics, 31(2):166–169, 2015. pdf, url (315 citations). [12] h5vc: scalable nucleotide tallies with HDF5. Paul Theodor Pyl, Julian Gehring, Bernd Fischer, and Wolfgang Huber. Bioinformatics, 30(10):1464–1466, 2014. pdf, url. P [13] Transcriptome-wide profiling and posttranscriptional analysis of hematopoietic stem/progenitor cell differentiation toward myeloid commitment. Daniel Klimmeck , Nina Cabezas-Wallscheid , Alejandro Reyes , Lisa von Paleske, Simon Renders, Jenny Hansson, Jeroen Krijgsveld, Wolfgang HuberB , and Andreas TrumppB . Stem Cell Reports, 3(5):858–875, 2014. pdf, url. P P P P [14] Mapping genetic interactions in human cancer cells with RNAi and multiparametric phenotyping. Christina Laufer , Bernd Fischer , Maximilian Billmann, Wolfgang HuberB , and Michael BoutrosB . Nature Methods, 10:427–431, 2013. pdf, url (37 citations). P P [15] Drift and conservation of differential exon usage across tissues in primate species. Alejandro Reyes , Simon Anders , Robert J. Weatheritt, Toby J. Gibson, Lars M. Steinmetz, and Wolfgang Huber. Proc. Natl. Acad. Sci. U.S.A., 110(38):15377–15382, 2013. pdf, url (11 citations). [16] Dynamical modelling of phenotypes in a genome-wide RNAi live-cell imaging assay. Gregoire Pau, Thomas Walter, Beate Neumann, Jean-Karim Heriché, Jan Ellenberg, and Wolfgang Huber. BMC Bioinformatics, 14(1):308, 2013. pdf, url. P P [17] The Genomic and Transcriptomic Landscape of a HeLa Cell Line. Jonathan Landry , Paul Theodor Pyl , Tobias Rausch, Thomas Zichner, Manu M. Tekkedil, Adrian M. Stütz, Anna Jauch, Raeka S. Aiyar, Gregoire Pau, Nicolas Delhomme, Julien Gagneur, Jan O. Korbel, Wolfgang HuberB , and Lars M. SteinmetzB . G3 (Bethesda), 3(8), 2013. pdf, url (85 citations). P P [18] Detecting differential usage of exons from RNA-Seq data. Simon Anders , Alejandro Reyes , and Wolfgang Huber. Genome Research, 22:2008–2017, 2012. pdf, url (170 citations). Collaborative papers 2012–16 P P [19] A genetic interaction map of cell cycle regulators. Maximilian Billmann , Thomas Horn , Bernd Fischer, Thomas Sandmann, Wolfgang Huber, and Michael Boutros. Molecular Biology of the Cell, 2016. pdf, url. [20] Myc depletion induces a pluripotent dormant state mimicking diapause. Roberta Scognamiglio, Nina Cabezas-Wallscheid, Marc Christian Thier, Sandro Altamura, Alejandro Reyes, Áine M. Prendergast, Daniel Baumgärtner, Larissa S. Carnevalli, Ann Atzberger, Simon Haas, Lisa von Paleske, Thorsten Boroviak, Philipp Wörsdörfer, Marieke A.G. Essers, Ulrich Kloz, Robert N. Eisenman, Frank Edenhofer, Paul Bertone, Wolfgang Huber, Franciscus van der Hoeven, Austin Smith, and Andreas Trumpp. Cell, 164(4):668–680, 2016. pdf, url. [21] Landscape and dynamics of transcription initiation in the malaria parasite Plasmodium falciparum. Sophie H. Adjalley, Christophe D. Chabbert, Bernd Klaus, Vicent Pelechano, and Lars M. Steinmetz. Cell Reports, 14(10):2463–2475, 2016. pdf, url. [22] Nuclear architecture organized by Rif1 underpins the replication-timing program. Rossana Foti, Stefano Gnan, Daniela Cornacchia, Vishnu Dileep, Aydan Bulut-Karslioglu, Sarah Diehl, Andreas Buness, Felix A. Klein, Wolfgang Huber, Ewan Johnstone, Remco Loos, Paul Bertone, David M. Gilbert, Thomas Manke, Thomas Jenuwein, and Sara C.B. Buonomo. Molecular Cell, 61(2):260–273, 2016. pdf, url. [23] CYP3A5 mediates basal and acquired therapy resistance in different subtypes of pancreatic ductal adenocarcinoma. Elisa M Noll, Christian Eisen, Albrecht Stenzinger, Elisa Espinet, Alexander Muckenhuber, Corinna Klein, Vanessa Vogel, Bernd Klaus, Wiebke Nadler, Christoph Rösli, Christian Lutz, Michael Kulke, Jan Engelhardt, Franziska M Zickgraf, Octavio Espinosa, Matthias Schlesner, Xiaoqi Jiang, Annette Kopp-Schneider, Peter Neuhaus, Marcus Bahra, Bruno V Sinn, Roland Eils, Nathalia A Giese, Thilo Hackert, Oliver Strobel, Jens Werner, Markus W Büchler, Wilko Weichert, Andreas Trumpp, and Martin R Sprick. Nature Medicine, 22:278–287, 2016. pdf, url. [24] An optogenetic method to modulate cell contractility during tissue morphogenesis. Giorgia Guglielmi, Joseph D. Barry, Wolfgang Huber, and Stefano De Renzis. Developmental Cell, 35(5):646–660, 2015. pdf, url. [25] Improved binding site assignment by high-resolution mapping of RNA-protein interactions using iCLIP. Christian Hauer, Tomaz Curk, Simon Anders, Thomas Schwarzl, Anne-Marie Alleaume, Jana Sieber, Ina Hollerer, Madhuri Bhuvanagiri, Wolfgang Huber, Matthias W. Hentze, and Andreas E. Kulozik. Nature Communications, 6(7921), 2015. pdf, url. [26] The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs. Benedikt M. Beckmann, Rastislav Horos, Bernd Fischer, Alfredo Castello, Katrin Eichelbaum, Anne-Marie Alleaume, Thomas Schwarzl, Tomaz Curk, Sophia Foehr, Wolfgang Huber, Jeroen Krijgsveld, and Matthias W. Hentze. Nature Communications, 6(10127), 2015. pdf, url. [27] Thermal proteome profiling monitors ligand interactions with cellular membrane proteins. Friedrich B.M. Reinhard, Dirk Eberhard, Thilo Werner, Holger Franken, Dorothee Childs, Carola Doce, Maria Fälth Savitski, Wolfgang Huber, Marcus Bantscheff, Mikhail M. Savitski, and Gerard Drewes. Nature Methods, 2015. pdf, url. [28] Mutational landscape and complexity in CLL. Thorsten Zenz and Wolfgang Huber. Blood, 126(18):2078–2079, 2015. pdf, url. [29] Expression atlas update—an integrated database of gene and protein expression in humans, animals and plants. Robert Petryszak, Maria Keays, Y. Amy Tang, Nuno A. Fonseca, Elisabet Barrera, Tony Burdett, Anja Füllgrabe, Alfonso Muñoz-Pomer Fuentes, Simon Jupp, Satu Koskinen, Oliver Mannion, Laura Huerta, Karine Megy, Catherine Snow, Eleanor Williams, Mitra Barzine, Emma Hastings, Hendrik Weisser, James Wright, Pankaj Jaiswal, Wolfgang Huber, Jyoti Choudhary, Helen E. Parkinson, and Alvis Brazma. Nucleic Acids Research, 44(1):D746–D752, 2016. pdf, url. [30] Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Grubert Fabian, Judith B. Zaugg, Maya Kasowski, Oana Ursu, Damek V. Spacek, Alicia R. Martin, Peyton Greenside, Rohith Srivas, Doug H. Phanstiel, Aleksandra Pekowska, Nastaran Heidari, Ghia Euskirchen, Wolfgang Huber, Jonathan K. Pritchard, Carlos D. Bustamante, Lars M. Steinmetz, Anshul Kundaje, and Michael Snyder. Cell, 162(5):1051–1065, 2015. pdf, url. [31] Recurrent CDKN1B (p27) mutations in hairy cell leukemia. Sascha Dietrich, Jennifer Hüllein, Stanley Chun-Wei Lee, Barbara Hutter, David Gonzalez, Sandrine Jayne, Martin J. S. Dyer, Małgorzata Oleś, Monica Else, Xiyang Liu, Mikołaj Słabicki, Bian Wu, Xavier Troussard, Jan Dürig, Mindaugas Andrulis, Claire Dearden, Christof von Kalle, Martin Granzow, Anna Jauch, Stefan Fröhling, Wolfgang Huber, Manja Meggendorfer, Torsten Haferlach, Anthony D. Ho, Daniela Richter, Benedikt Brors, Hanno Glimm, Estella Matutes, Omar Abdel Wahab, and Thorsten Zenz. Blood, 126(8):1005–1008, 2015. pdf, url. [32] Single-cell polyadenylation site mapping reveals 3’ isoform choice variability. Lars Velten, Simon Anders, Aleksandra Pekowska, Aino I Järvelin, Wolfgang Huber, Vicent Pelechano, and Lars M. Steinmetz. Molecular Systems Biology, 11(6), 2015. pdf, url. [33] BRAF inhibitor therapy in HCL. Sascha Dietrich and Thorsten Zenz. Best Practice & Research Clinical Haematology, 28(4):246–252, 2015. url. [34] A high-throughput ChIP-Seq for large-scale chromatin studies. Christophe D Chabbert, Sophie H Adjalley, Bernd Klaus, Emilie S Fritsch, Ishaan Gupta, Vicent Pelechano, and Lars M. Steinmetz. Molecular Systems Biology, 11(1), 2015. pdf, url. [35] A novel inflammatory pathway mediating rapid hepcidin-independent hypoferremia. Claudia Guida, Sandro Altamura, Felix A. Klein, Bruno Galy, Michael Boutros, Artur J. Ulmer, Matthias W. Hentze, and Martina U. Muckenthaler. Blood, 125(14):2265–2275, 2015. pdf, url (13 citations). [36] Fundamental physical cellular constraints drive self-organization of tissues. Daniel SánchezGutiérrez, Melda Tozluoglu, Joseph D. Barry, Alberto Pascual, Yanlan Mao, and Luis M Escudero. The EMBO Journal, 35(1):77–88, 2015. pdf, url. [37] An open data ecosystem for cell migration research. Paola Masuzzo, Lennart Martens, Christophe Ampe, Kurt I. Anderson, Joseph Barry, Olivier De Wever, Olivier Debeir, Christine Decaestecker, Helmut Dolznig, Peter Friedl, Cedric Gaggioli, Benjamin Geiger, Ilya G. Goldberg, Elias Horn, Rick Horwitz, Zvi Kam, Sylvia E. Le Dévédec, Danijela Matic Vignjevic, Josh Moore, Jean-Christophe Olivo-Marin, Erik Sahai, Susanna A. Sansone, Victoria Sanz-Moreno, Staffan Strömblad, Jason Swedlow, Johannes Textor, Marleen Van Troys, and Roman Zantl. Trends in Cell Biology, 25(2):55–58, 2015. pdf, url. [38] Statistical relevance – relevant statistics, part I. 34(22):2727–2730, 2015. pdf, url. Bernd Klaus. The EMBO Journal, [39] A discrete transition zone organizes the topological and regulatory autonomy of the adjacent Tfap2c and Bmp7 genes. Taro Tsujimura, Felix A. Klein, Katja Langenfeld, Juliane Glaser, Wolfgang Huber, and François Spitz. PLoS Genetics, 11(1):e1004897, 2015. pdf, url. [40] Cell-to-cell expression variability followed by signal reinforcement progressively segregates early mouse lineages. Yusuke Ohnishi, Wolfgang Huber, Akiko Tsumura, Minjung Kang, Panagiotis Xenopoulos, Kazuki Kurimoto, Andrzej K. Oleś, Marcos J. Araúzo-Bravo, Mitinori Saitou, Anna-Katerina Hadjantonakis, and Takashi Hiiragi. Nature Cell Biology, 16(1):27–37, 2014. pdf, url (49 citations). [41] Protein quality control at the inner nuclear membrane. Anton Khmelinskii, Marina Pantazopoulou, Bernd Fischer, Deike J. Omnus, Gaëlle Le Dez, Audrey Brossard, Alexander Gunnarsson, Joseph D. Barry, Matthias Meurer, Daniel Kirrmaier, Charles Boone, Wolfgang Huber, Gwenaël Rabut, Per O. Ljungdahl, and Michael Knop. Nature, 516(7531):410–413, 2014. pdf, url. P P [42] Enhancer loops appear stable during development and are associated with paused polymerase. Yad Ghavi-Helm, Felix A. Klein , Tibor Pakozdi , Lucia Ciglar, Daan Noordermeer, Wolfgang Huber, and Eileen E. M. Furlong. Nature, 512(7512):96–100, 2014. pdf, url (51 citations). P [43] Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Nina Cabezas-Wallscheid, Daniel Klimmeck, Jenny Hansson, Daniel B Lipka, Alejandro Reyes, Qi Wang, Dieter Weichenhan, Amelie Lier, Lisa von Paleske, Simon Renders, Peer Wünsche, Petra Zeisberger, David Brocks, Lei Gu, Carl Herrmann, Simon Haas, Marieke A G Essers, Benedikt Brors, Roland Eils, Wolfgang Huber, Michael D Milsom, Christoph Plass, Jeroen Krijgsveld, and Andreas Trumpp. Cell Stem Cell, 15(4):507–522, 2014. pdf, url (24 citations). P P P P [44] Measuring genetic interactions in human cells by RNAi and imaging. Christina Laufer, Bernd Fischer, Wolfgang Huber, and Michael Boutros. Nature Protocols, 9(10):2341–2353, 2014. pdf, url. [45] Alternative polyadenylation diversifies post-transcriptional regulation by selective RNA– protein interactions. Ishaan Gupta, Sandra Clauder-Münster, Bernd Klaus, Aino I Järvelin, Raeka S. Aiyar, Vladimir Benes, Stefan Wilkening, Wolfgang Huber, Vicent Pelechano, and Lars M. Steinmetz. Molecular Systems Biology, 10(2), 2014. pdf, url (12 citations). [46] Expression Atlas update–a database of gene and transcript expression from microarray- and sequencing-based functional genomics experiments. Robert Petryszak, Tony Burdett, Benedetto Fiorelli, Nuno A. Fonseca, Mar Gonzalez-Porta, Emma Hastings, Wolfgang Huber, Simon Jupp, Maria Keays, Nataliya Kryvych, Julie McMurry, John C. Marioni, James Malone, Karine Megy, Gabriella Rustici, Amy Y. Tang, Jan Taubert, Eleanor Williams, Oliver Mannion, Helen E. Parkinson, and Alvis Brazma. Nucleic Acids Research, 42(1):D926–932, 2014. pdf, url (63 citations). [47] A genome-wide map of mitochondrial DNA recombination in yeast. Emilie S. Fritsch, Christophe D. Chabbert, Bernd Klaus, and Lars M. Steinmetz. Genetics, 198(2):755–771, 2014. pdf, url. [48] Directional tissue migration through a self-generated chemokine gradient. Erika Donà, Joseph D. Barry, Guillaume Valentin, Charlotte Quirin, Anton Khmelinskii, Andreas Kunze, Sevi Durdu, Lionel R. Newton, Ana Fernandez-Minan, Wolfgang Huber, Michael Knop, and Darren Gilmour. Nature, 503(7475):285–289, 2013. pdf, url (58 citations). P P P [49] Accounting for technical noise in single-cell RNA-seq experiments. Philip Brennecke , Simon Anders , Jong Kyoung Kim , Aleksandra A. Kolodziejczyk, Xiuwei Zhang, Valentina Proserpio, Bianka Baying, Vladimir Benes, Sarah A. Teichmann, John C. Marioni, and Marcus G. Heisler. Nature Methods, 10(11):1093–1095, 2013. pdf, url (77 citations). [50] Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Simon Anders, Davis J McCarthy, Yunshun Chen, Michal Okoniewski, Gordon K Smyth, Wolfgang Huber, and Mark D Robinson. Nature Protocols, 8(9):1765–1786, 2013. pdf, url (136 citations). [51] An Evaluation of High-Throughput Approaches to QTL Mapping in Saccharomyces cerevisiae. Stefan Wilkening, Gen Lin, Emilie S. Fritsch, Manu M. Tekkedil, Simon Anders, Raquel Kuehn, Michelle Nguyen, Raeka S. Aiyar, Michael Proctor, Nikita A. Sakhanenko, David J. Galas, Julien Gagneur, Adam Deutschbauer, and Lars M. Steinmetz. Genetics, 196(3):853–865, 2014. pdf, url (11 citations). P P [52] High-content siRNA screen reveals global ENaC regulators and potential cystic fibrosis therapy targets. Joana Almaça , Diana Faria , Marisa Sousa, Inna Uliyakina, Christian Conrad, Lalida Sirianant, Luka A. Clarke, José Paulo Martins, Miguel Santos, Jean-Karim Heriché, Wolfgang Huber, Rainer Schreiber, Rainer Pepperkok, Karl Kunzelmann, and Margarida D. Amaral. Cell, 154(6):1390–1400, 2013. pdf, url (14 citations). [53] Software for computing and annotating genomic ranges. Michael Lawrence, Wolfgang Huber, Hervé Pagès, Patrick Aboyoun, Marc Carlson, Robert Gentleman, Martin T. Morgan, and Vincent J. Carey. PLoS Computational Biology, 9(8):e1003118, 2013. pdf, url (92 citations). [54] CellH5: a format for data exchange in high-content screening. Christoph Sommer, Michael Held, Bernd Fischer, Wolfgang Huber, and Daniel W. Gerlich. Bioinformatics, 29:1580–1582, 2013. pdf, url. [55] Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments with small sample size. Danni Yu, Wolfgang Huber, and Olga Vitek. Bioinformatics, 29:1275– 1282, 2013. pdf, url. P P [56] Control of tissue morphology by Fasciclin III-mediated intercellular adhesion. Richard E. Wells , Joseph D. Barry , Simon Cuhlmann, Paul Evans, Wolfgang Huber, David Strutt, and Martin P. Zeidler. Development, 140:3858–3868, 2013. pdf, url. [57] Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Kathi Zarnack, Julian König, Mojca Tajnik, Inigo Martincorena, Sebastian Eustermann, Isabelle Stévant, Alejandro Reyes, Simon Anders, Nicholas M. Luscombe, and Jernej Ule. Cell, 152(3):453–466, 2013. pdf, url (69 citations). [58] An efficient method for genome-wide polyadenylation site mapping and RNA quantification. Stefan Wilkening, Vicent Pelechano, Aino I. Järvelin, Manu M. Tekkedil, Simon Anders, Vladimir Benes, and Lars M. Steinmetz. Nucleic Acids Research, 41(5):e65, 2013. pdf, url (27 citations). [59] Properties of isotope patterns and their utility for peptide identification in large-scale proteomic experiments. Satoshi Okawa, Bernd Fischer, and Jeroen Krijgsveld. Rapid Communications in Mass Spectrometry, 27(9):1067–1075, 2013. url. [60] RNA-binding proteins in Mendelian disease. Alfredo Castello, Bernd Fischer, Matthias W Hentze, and Thomas Preiss. Trends in Genetics, 29:318–327, 2013. pdf, url (43 citations). [61] System-wide identification of RNA-binding proteins by interactome capture. Alfredo Castello, Rastislav Horos, Claudia Strein, Bernd Fischer, Katrin Eichelbaum, Lars M. Steinmetz, Jeroen Krijgsveld, and Matthias W Hentze. Nature Protocols, 8(3):491–500, 2013. pdf, url (26 citations). [62] Biggest challenges in bioinformatics. Jonathan C Fuller, Pierre Khoueiry, Holger Dinkel, Kristoffer Forslund, Alexandros Stamatakis, Joseph Barry, Aidan Budd, Theodoros G Soldatos, Katja Linssen, and Abdul Mateen Rajput. EMBO reports, 14(4):302–304, 2013. pdf, url. [63] The RNA-binding protein repertoire of embryonic stem cells. S Chul Kwon, Hyerim Yi, Katrin Eichelbaum, Sophia Föhr, Bernd Fischer, Kwon Tae You, Alfredo Castello, Jeroen Krijgsveld, Matthias W Hentze, and V Narry Kim. Nature Structural and Molecular Biology, 2013. pdf, url (69 citations). [64] Highly coordinated proteome dynamics during reprogramming of somatic cells to pluripotency. Jenny Hansson, Mahmoud Reza Rafiee, Sonja Reiland, Jose M. Polo, Julian Gehring, Satoshi Okawa, Wolfgang Huber, Konrad Hochedlinger, and Jeroen Krijgsveld. Cell Reports, 2(6):1579–1592, 2012. pdf, url (67 citations). [65] A cross-platform toolkit for mass spectrometry and proteomics. Matthew C Chambers, Brendan Maclean, Robert Burke, Dario Amodei, Daniel L Ruderman, Steffen Neumann, Laurent Gatto, Bernd Fischer, Brian Pratt, Jarrett Egertson, Katherine Hoff, Darren Kessner, Natalie Tasman, Nicholas Shulman, Barbara Frewen, Tahmina A Baker, Mi-Youn Brusniak, Christopher Paulse, David Creasy, Lisa Flashner, Kian Kani, Chris Moulding, Sean L Seymour, Lydia M Nuwaysir, Brent Lefebvre, Frank Kuhlmann, Joe Roark, Paape Rainer, Suckau Detlev, Tina Hemenway, Andreas Huhmer, James Langridge, Brian Connolly, Trey Chadick, Krisztina Holly, Josh Eckels, Eric W Deutsch, Robert L Moritz, Jonathan E Katz, David B Agus, Michael MacCoss, David L Tabb, and Parag Mallick. Nature Biotechnology, 30(10):918–920, 2012. pdf, url (175 citations). P P [66] Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Alfredo Castello , Bernd Fischer , Katrin Eichelbaum, Rastislav Horos, Benedikt M. Beckmann, Claudia Strein, Norman E. Davey, David T. Humphreys, Thomas Preiss, Lars M. Steinmetz, Jeroen Krijgsveld, and Matthias W. Hentze. Cell, 149:1393–1406, 2012. pdf, url (319 citations). [67] Tandem fluorescent protein timers for in vivo analysis of protein dynamics. Anton Khmelinskii, Philipp J. Keller, Anna Bartosik, Matthias Meurer, Joseph D. Barry, Balca R. Mardin, Andreas Kaufmann, Susanne Trautmann, Malte Wachsmuth, Gislene Pereira, Wolfgang Huber, Elmar Schiebel, and Michael Knop. Nature Biotechnology, 30:708–714, 2012. pdf, url (49 citations). Preprints [68] Data-driven hypothesis weighting increases detection power in big data analytics. Nikolaos Ignatiadis, Bernd Klaus, Judith Zaugg, and Wolfgang Huber. bioRχiv, 2015. pdf, url. [69] Neural lineage induction reveals multi-scale dynamics of 3D chromatin organization. Aleksandra Pekowska, Bernd Klaus, Felix Alexander Klein, Simon Anders, Małgorzata Oleś, Lars M. Steinmetz, Paul Bertone, and Wolfgang Huber. bioRχiv, 2014. pdf, url. [70] Mutated SF3B1 is associated with transcript isoform changes of the genes UQCC and RPL31 both in clls and uveal melanomas. Alejandro Reyes, Carolin Blume, Vicent Pelechano, Petra Jakob, Lars M. Steinmetz, Thorsten Zenz, and Wolfgang Huber. bioRχiv, 2014. pdf, url. E List of External Grants (since 2012) Duration 2015-18 36 months Name – Funding Body. Role. Topic SOUND (Statistical Multi-Omics Understanding) – Collaborative research project, Horizon 2020 Research and Innovation programme Personalising Health and Care, European Commission. I am the coordinator and lead one scientific work-package. Topic: to create the bioinformatic tools for statistically informed use of personal genomic and other omic data in medicine. 2011-15 60 months Systems Microscopy – Network of Excellence, FP7-HEALTH-2010, European Commission. I led three RTD work packages and was part of the Executive Board. Topic: data-driven modelling of cell biological processes from life cell imaging data. 2012-15 36 months Radiant – Collaborative research project, FP7-HEALTH-2010, European Commission. I was scientific co-coordinator (with Magnus Rattray and Neil Lawrence) and led two RTD work packages. Topic: statistical methods for high-throughput sequencing technologies. 2013-15 24 months BIGDATA (Scalable Statistical Computing for Emerging Omics Data Streams) – US National Science Foundation (NSF) Mid-scale project: DA: ESCE: Collaborative Research. I was co-investigator. Topic: scaling statistical methods and Bioconductor software for large ‘omics data streams. 2015-17 24 months BioTop (Bioinformatic tool harmonization for personalized cancer care) – BMBF. I am a co-investigator, responsible for RNA-seq data types. Topic: standardising methods for analysing high-throughput sequencing data for translational cancer research. 2016-19 36 months TRANSCAN GCH-CLL (Translational research on human tumour heterogeneity to overcome recurrence and resistance to therapy) – ERA-NET on translational Cancer Research (TRANSCAN) project. I am a co-investigator, responsible for computational aspects. Topic: intra-tumour heterogeneity in chronic lymphocytic leukaemia. 2014-17 36 months GSK postdoc fellowship – Cellzome GmbH. Academic partner. Topic: 3-year postdoc project on developing computational and statistical methods for thermal proteome profiling. 2015-19 60 months HD-HuB (Heidelberg Centre for Human Bioinformatics) – BMBF. Coinvestigator, contributing a work-package on R/Bioconductor based workflows. F Curriculum Vitae Wolfgang Huber European Molecular Biology Laboratory (EMBL) D 69117 Heidelberg ∗ 28.5.1968 in Bad Säckingen nationality: Germany www.huber.embl.de [email protected] Positions EMBL Dec 2011 - present Heidelberg, Mar 2009 - present Cambridge (UK), Sep 2004 - Feb 2009 DKFZ Heidelberg, Mar 2000 - Sep 2004 IBM Research Almaden, San Jose (California) Jun 1998 - Dec 1999 University of Freiburg Oct 1994 - May 1998 Univ. Clinic Freiburg Sep 1991 - Dec 1997 Research group leader Senior Scientist Genome Biology Unit European Bioinformatics Institute (EBI) Postdoc cancer transcriptomics Postdoc cheminformatics Research and teaching assistant, Faculty of Physics Research assistant, Department of Neurology Education 1998 Univ. of Freiburg Dr. rer. nat. (Theoretical Physics) Thesis Dynamics of strongly driven open quantum systems 1994 Univ. of Freiburg Diplom (Physics) Minor in Mathematics (Probability and Statistics) 1990/91 Univ. of Edinburgh 1990 Univ. of Freiburg Non-graduating exchange student Physics Vordiplom (Physics) Minors in Mathematics and Chemistry Academic Services – external Journal reviewing Editorial board Grant review boards Bioinformatics, Biostatistics, Cell Reports, EMBO Reports, FEBS Letters, Genome Biology, G3 (Genes k Genomes k Genetics), Genome Research, Methods, Molecular Systems Biology, Nature, Nature Biotechnology, Nature Cell Biology, Nature Methods, Nucleic Acids Research, PLoS ONE, Science, Science Translational Medicine; Programme Committees ECCB 2012, ISMB/ECCB 2013, ECCB 2014 Bioinformatics, Giga Science, F1000Prime HFSP Fellowships Research proposal reviewing Boards Consulting Academy of Finland, ERC, French NCI (INCa), HRCMM, National Science Centre (Poland), Swiss National Science Foundation (SNF), Skolkovo Fund, Stichting Kinderen Kankervrij (Foundation Children Cancerfree), Wellcome Trust, Wiener Wissenschafts-, Forschungs- und Technologiefonds (WWTF), others Scientific Advisory Board (SAB) and Technical Advisory Board: Bioconductor Project (2003 - ) SAB: Sophia Genetics S.A. (CH) (2011 - 2015) SAB: UMR3244 in Institut Curie (F) (2015 - ) SAB: Graduate School of Quantitative Biosciences Munich (2014 - ) SAB (Observer): Expression Atlas at EBI Executive Board: Systems Microscopy EC FP7 Network of Excellence (2011 - 2015) Genentech (2010 - 2015) Evotec (2013 - 2014) Academic Services – within EMBL Annually since 2007 2012-2016 Coordinator of the ’Omics module of the EMBL International PhD Programme course Thesis Advisory Committee: >40 students Conference (co-)organisation 16 - 18 February 2012 EMBL Heidelberg Omics and Personalised Health, conference (140 participants) with Lars Steinmetz, Lee Hood and Rudi Balling 7 - 8 June 2014 EMBL Heidelberg Annual meeting of the RADIANT consortium (37 participants) with Magnus Rattray 12 - 13 January 2015 EMBL Heidelberg Bioconductor European Developer Conference (44 participants) with Martin Morgan 31 May - 5 June 2015 Centro Stefano Franscini, Ascona, CH Workshop on Statistical Learning of Biological Systems from Perturbations (55 participants) with Niko Beerenwinkel, Peter Bühlmann 16 - 19 November 2015 EMBL Heidelberg Stanford - EMBL conference: Omics and Personalised Health (150 participants) with Lars Steinmetz, Judith Zaugg, Michael Snyder, Peer Bork, Jan Ellenberg 24 - 25 November 2015 CR UK Manchester Institute 19 - 21 May 2016 DKFZ Heidelberg 6 - 8 June 2016 EMBL C1omics - Single Cell ’Omics (57 participants) with Magnus Rattray, Crispin Miller Cancer Systems Genetics with Claudia Scholl, Stefan Fröhling, Michael Boutros Perspectives in Translational Medicine, EMBL Partnership Conference with Plamena Markova, Andreas Kulozik, Luis Serrano, Kjetil Tasken, Matthias Wilmanns 4 September 2016 The Hague, NL Clinical Bioinformatics as a Service, ECCB Workshop with Niko Beerenwinkel, Daniel Stekhoven, Simon Tavaré Teaching 2 - 6 July 2012 CSAMA Summer School: Statistics and Computing in Genome 23 - 28 June 2013 Data Science, Brixen, South Tyrol 22 - 27 June 2014 14 - 19 June 2015 10 - 15 July 2016 17 - 22 October 2012 EMBO Practical Course: Analysis and informatics of transcriptomics data, Shenzhen, China. 24 - 25 January 2013 16 - 16 January 2015 25 - 26 February 2016 EMBL Practical Course: Advanced R programming, EMBL Heidelberg 9 September 2012 ECCB Tutorial – Reads to Biological Patterns: End-to-End Differential Expression Analysis of RNA Sequencing Data Using Bioconductor ECCB Workshop – Analysis of Differential Isoform Usage by RNA-seq: Statistical Methodologies and Open Software 7 September 2014 3 - 8 March 2013 EMBO Practical Course: High-throughput RNAi, EMBL/DKFZ Heidelberg 29 October - 3 November 2012 20 - 24 October 2014 19 - 23 October 2015 5 - 9 September 2016 EMBO Practical Course: Analysis and informatics of transcriptomics data, EBI-EMBL, Hinxton, UK 15 - 20 October 2012 20 -26 October 2014 17 - 22 October 2016 EMBO Practical Course: High-Throughput Microscopy for Systems Biology, EMBL Heidelberg Above are the courses that I organised or co-organised, with level of responsibility ranked from top to bottom. I have taught at others, mentioned below. Selected speaker invitations (2012-16 only) 29 February 2012, Munich, DE 20 March 2012, Mainz, DE Genomatix GmbH, internal seminar University, Institute of Molecular Biology, institute seminar 26 June 2012, Augsburg, DE University, Institute for Mathematics, institute seminar 28 June 2012, Würzburg, DE University, Institute for Medical Infection Genomics, RNA-seq Workshop 23 - 25 July 2012, Seattle, USA 31 August 2012, Cambridge, UK 10 - 11 October 2012, Cambridge, UK 12 - 13 October 2012, Potsdam, DE Bioconductor conference From Phenotypes to Pathways, conference Literature-Data Integration, workshop From genomes to networks - New developments in complex disease analysis, annual workshop of the Society for GeneDiagnostics) 6 - 7 December 2012, Dresden, DE Biotec Forum, conference 10 December 2012, Heidelberg, DE University, Heidelberger Kolloquium Medizinische Biometrie, Informatik und Epidemiologie 11 December 2012, Heidelberg, DE NGFN Annual Meeting, conference 13 - 14 December 2012, Zurich, CH Bioconductor Developer Conference 19 March 2013, Palo Alto, USA 20 March 2013, South San Francisco, USA 8 April 2013, Barcelona, ES Stanford Genome Technology Centre, institute seminar Genentech Inc., internal seminar Institute of Predictive and Personalized Medicine of Cancer (IMPPC), institute seminar 24 - 27 April 2013, Freiburg, DE Preclinical models of cancer: Towards enhanced clinical relevance and predictivity, conference 13 May 2013, Lisbon, PT University, Instituto de Medicina Molecular (IMM), institute seminar Dagstuhl Seminar 13212: Computational Methods Aiding Early-Stage Drug Design 19 - 24 May 2013, Dagstuhl, DE 17 - 19 July 2013, Seattle, USA 23 July 2013, Berlin, DE 11-16 August 2013, Banff, CA 8 - 11 September 2013, Bertinoro, I 23 September 2013, Tübingen, DE Bioconductor Conference ISMB Workshop: Professional Networks in Bioinformatics BIRS workshop: Statistical Data Integration Challenges in Computational Biology: Regulatory Networks and Personalized Medicine Computational Biology meeting: Computational Cancer Genomics, conference 25 September 2013, Stockholm, SE Summer School on Machine Learning for Personalized Medicine Karolinska Institutet, institute seminar 13 - 19 October 2013, Bedlewo, PL Autumn school on Computational Aspects of Gene Regulation 28 October - 3 November 2013, Recife, Brazil 9 - 10 December 2013, Cambridge, UK 12 - 13 December 2013, Cambridge, UK 9 - 10 January 2014, Paris, F 16 January 2014, Münster, DE 12 May 2014, Heidelberg, DE 2 June 2014, Stockholm, SE RNA-seq course at Brazilian Symposium on Bioinformatics Bioconductor Developer Conference Quantitative Methods in Gene Regulation, conference Institut Curie, institute seminar Max-Planck-Institute for Molecular Biomedicine, institute seminar Cellzome, internal seminar Systems Microscopy, conference 12 May 2014, Heidelberg, DE Cellzome, internal seminar 2 July 2014, Saarbrücken, DE Max-Planck-Institute for Informatics, institute seminar 20 October 2014, Munich, DE LMU, Gene Centre, institute seminar 29 - 31 October 2014, Stockholm, SE EMBO Workshop on a Systems-Level View of Cytoskeletal Function, conference 27 - 28 November 2014, Helsiniki, FI Institute for Molecular Medicine of Finland (FIMM), institute seminar RADIANT workshop 29 - 30 January 2015, Zurich, CH 12 - 13 February 2015, Munich, DE 17 - 19 February 2015, NYU Abu Dhabi, UAE 25 March 2015, Heidelberg, DE 16 - 17 April 2015, Kloster Johannisberg, DE 9 June 2015, London, UK 20 - 22 July 2015, Seattle, USA Statistical Methods for Post Genomic Data, conference Genomics and Systems Biology, conference R User Meeting Rhein-Neckar, workshop Cancer Genomics Meets Cancer Proteomics, workshop Big Data Analytics, conference Bioconductor conference 31 July - 2 August 2015, Pozega, Croatia 15 - 18 September 2015, Saas-Fee, CH Summer School of Science 26 - 27 October 2015, Arlington, USA NSF workshop on Mathematical Biology 7 - 8 December 2015, Cambridge, UK Bioconductor Developer Conference 13 January 2016, Heidelberg, DE 22 - 29 January 2016, Bellairs, Barbados 15 February 2016, London, UK 23 February 2016, Basel, CH 8 March 2016, Palo Alto, USA CERN ROOT 20th anniversary workshop University Hospital, Medical Clinic V, institute seminar Genetic Networks, workshop Imperial College BRC Genomics Seminar Series Novartis, internal seminar Stanford University, Department of Statistics, institute seminar 9 March 2016, Claremont, USA Harvey Mudd College, Biology Colloqium 10 March 2016, Berkeley, USA UC Berkeley, Department of Statistics, Statistics & Genomics Seminar UC Santa Cruz, institute seminar 16 March 2016, Santa Cruz, USA 18 March 2016, Mountain View, USA 23 March 2016, Palo Alto, USA 14 April 2016, Utrecht, NL 22 April 2016, Mainz, DE 25 - 27 April 2016, Copenhagen, DK 2 May 2016, Paris, F 23andMe, internal seminar Stanford Genome Technology Center, institute seminar Centre for Molecular Medicine, institute seminar Institute for Medical Biometry, Epidemiology and Informatics (IMBEI), symposium MedBioinformatics, conference High Energy Physics Software Foundation, workshop 30 May - 3 June 2016, Paris, F École Analyse Génome Tumoral, summer school Software See also http://www.huber.embl.de/software Primary author and maintainer vsn: microarray normalisation [154] cellHTS , cellHTS2 : RNAi screen normalisation and quality control [123] tilingArray: transcript discovery and mapping [124] arrayQualityMetrics: interactive microarray data quality reports [105] Initiation, co-authorship, supervision DESeq, DESeq2 : RNA-seq differential expression [10] [82] htseq: processing reads from high-throughput sequencing [11] IHW : Independent hypothesis weighting [68] lpsymphony: mixed integer-linear program solver biomaRt: programmatic access to BioMarts [131] EBImage: image processing in R [84] DEXSeq: detecting differential usage of exons from RNA-seq data [18] h5vc: scalable nucleotide tallies with HDF5 [12] rhdf5 : HDF5 interface to R FourCSeq: analysis of 4C sequencing data [7] SomaticSignatures: inferring mutational signatures from singlenucleotide variants [6] BiocStyle: document formatting for executable documents Publications from before 2012 P P P [71] Mapping of signalling networks through synthetic genetic interaction analysis by RNAi. Thomas Horn , Thomas Sandmann , Bernd Fischer , Elin Axelsson, Wolfgang Huber, and Michael Boutros. Nature Methods, 8(4), 2011. pdf, url (74 citations). P P P [72] Antisense expression increases gene expression variability and locus interdependency. Zhenyu Xu , Wu Wei , Julien Gagneur , Sandra Clauder-Münster, Miłosz Smolik, Wolfgang Huber, and Lars M. Steinmetz. Molecular Systems Biology, 7, 2011. pdf, url (65 citations). [73] cAMP response element-binding protein is a primary hub of activity-driven neuronal gene expression. E. Benito, L. M. Valor, M. Jimenez-Minchan, W. Huber, and A. Barco. Journal of Neuroscience, 31:18237–18250, 2011. pdf, url (31 citations). [74] Genome-wide survey of post-meiotic segregation during yeast recombination. Eugenio Mancera, Richard Bourgon, Wolfgang Huber, and Lars M. Steinmetz. Genome Biology, 12:R36, 2011. pdf, url (10 citations). [75] Contributions of the EMERALD project to assessing and improving microarray data quality. Vidar Beisvåg, Audrey Kauffmann, James Malone, Carole Foy, Marc Salit, Heinz Schimmel, Erik Bongcam-Rudloff, Ulf Landegren, Helen Parkinson, Wolfgang Huber, Alvis Brazma, Arne K. Sandvik, and Martin Kuiper. BioTechniques, 50:27–31, 2011. pdf, url. [76] Enterotypes of the human gut microbiome. Mani Arumugam, Jeroen Raes, E. Pelletier, D. Le Paslier, T. Yamada, D. R. Mende, G. R. Fernandes, J. Tap, T. Bruls, J. M. Batto, M. Bertalan, N. Borruel, F. Casellas, L. Fernandez, L. Gautier, T. Hansen, M. Hattori, T. Hayashi, M. Kleerebezem, K. Kurokawa, M. Leclerc, F. Levenez, C. Manichanh, H. B. Nielsen, T. Nielsen, N. Pons, J. Poulain, J. Qin, T. Sicheritz-Ponten, S. Tims, D. Torrents, E. Ugarte, E. G. Zoetendal, J. Wang, F. Guarner, O. Pedersen, W. M. de Vos, S. Brunak, J. Dore, J. Weissenbach, S. D. Ehrlich, Peer Bork, Metagenomics Consortium:, M. Antolin, F. Artiguenave, H. M. Blottiere, M. Almeida, C. Brechot, C. Cara, C. Chervaux, A. Cultrone, C. Delorme, G. Denariaz, R. Dervyn, K. U. Foerstner, C. Friss, M. van de Guchte, E. Guedon, F. Haimet, Wolfgang Huber, J. van Hylckama-Vlieg, A. Jamet, C. Juste, G. Kaci, J. Knol, O. Lakhdari, S. Layec, K. Le Roux, E. Maguin, A. Merieux, R. Melo Minardi, C. M’rini, J. Muller, R. Oozeer, J. Parkhill, P. Renault, M. Rescigno, N. Sanchez, S. Sunagawa, A. Torrejon, K. Turner, G. Vandemeulebrouck, E. Varela, Y. Winogradsky, and G. Zeller. Nature, 473:174–180, 2011. pdf, url (141 citations). [77] Assessing Affymetrix GeneChip microarray quality. Matthew M. McCall, Peter N. Murakami, Margus Lukk, Wolfgang Huber, and Rafael A. Irizarry. BMC Bioinformatics, 12:137, 2011. pdf, url (23 citations). [78] Polymorphisms in CTNNBL1 in relation to colorectal cancer with evolutionary implications. S. Huhn, D. Ingelfinger, J. L. Bermejo, M. Bevier, B. Pardini, A. Naccarati, V. Steinke, N. Rahner, E. Holinski-Feder, M. Morak, H. K. Schackert, H. Gorgens, C. P. Pox, T. Goecke, M. Kloor, M. Loeffler, R. Buttner, L. Vodickova, J. Novotny, K. Demir, C. M. Cruciat, R. Renneberg, W. Huber, C. Niehrs, M. Boutros, P. Propping, P. Vodieka, K. Hemminki, and A. Forsti. Int J Mol Epidemiol Genet, 2:36–50, 2011. pdf, url. [79] Extracting quantitative genetic interaction phenotypes from matrix combinatorial RNAi. Elin Axelsson, Thomas Sandmann, Thomas Horn, Michael Boutros, Wolfgang Huber, and Bernd Fischer. BMC Bioinformatics, 12:342, 2011. pdf, url. [80] Relating CNVs to transcriptome data at fine-resolution: assessment of the effect of variant size, type, and overlap with functional regions. Andreas Schlattl, Simon Anders, Sebastian M. Waszak, Wolfgang Huber, and Jan O. Korbel. Genome Research, 21:2004–2013, 2011. pdf, url (42 citations). [81] Independent filtering increases detection power for high-throughput experiments. Richard Bourgon, Robert Gentleman, and Wolfgang Huber. PNAS, 107(21):9546–9551, 2010. pdf, url (152 citations). [82] Differential expression analysis for sequence count data. Simon Anders and Wolfgang Huber. Genome Biology, 11:R106, 2010. pdf, url (2460 citations). P P [83] Clustering phenotype populations by genome-wide RNAi and multiparametric imaging. Florian Fuchs , Gregoire Pau , Dominique Kranz, Oleg Sklyar, Christoph Budjan, Sandra Steinbrink, Thomas Horn, Angelika Pedal, Wolfgang Huber, and Michael Boutros. Molecular Systems Biology, 6(370), 2010. pdf, url (59 citations). [84] EBImage – an R package for image processing with applications to cellular phenotypes. Gregoire Pau, Florian Fuchs, Oleg Sklyar, Michael Boutros, and Wolfgang Huber. Bioinformatics, 26:979–981, 2010. pdf, url (60 citations). [85] Genome-wide analysis of mRNA decay patterns during early Drosophila development. Stefan Thomsen, Simon Anders, Sarath Chandra Janga, Wolfgang Huber, and Claudio R. Alonso. Genome Biology, 11:R93, 2010. pdf, url (36 citations). [86] Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division genes. Beate Neumann, Thomas Walter, Jean-Karim Heriché, Jutta Bulkescher, Holger Erfle, Christian Conrad, Phill Rogers, Ina Poser, Michael Held, Urban Liebel, Cihan Cetin, Frank Sieckmann, Gregoire Pau, Rolf Kabbe, Annelie Wuensche, Venkata Satagopam, Michael H. A. Schmitz, Catherine Chapuis, Daniel W. Gerlich, Reinhard Schneider, Roland Eils, Wolfgang Huber, Jan-Michael Peters, Anthony A. Hyman, Richard Durbin, Rainer Pepperkok, and Jan Ellenberg. Nature, 464(7289):721–727, 2010. pdf, url (348 citations). [87] CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging. Michael Held, M. H. Schmitz, Bernd Fischer, Thomas Walter, Beate Neumann, M. H. Olma, M. Peter, Jan Ellenberg, and Daniel W. Gerlich. Nature Methods, 7(9):747–754, 2010. pdf, url (93 citations). [88] Addressing accuracy and precision issues in iTRAQ quantitation. Natasha A. Karp, Wolfgang Huber, Pawel G. Sadowski, Philip D. Charles, Svenja V. Hester, and Kathryn S. Lilley. Molecular and Cellular Proteomics, 9:1885–97, 2010. pdf, url (176 citations). [89] Organelle proteomics experimental designs and analysis. Laurent Gatto, Juan Antonio Vizcaı́no, Henning Hermjakob, Wolfgang Huber, and Kathryn S. Lilley. Proteomics, 2010. pdf, url (23 citations). [90] High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Marina V. Granovskaia, Lars J. Jensen, Matthew E. Ritchie, Jörn Tödling, Ye Ning, Peer Bork, Wolfgang Huber, and Lars M. Steinmetz. Genome Biology, 11:R24, 2010. pdf, url (40 citations). [91] Variation in transcription factor binding among humans. Maya Kasowski, Fabian Grubert, Christopher Heffelfinger, Manoj Hariharan, Akwasi Asabere, Sebastian M. Waszak, Lukas Habegger, Joel Rozowsky, Minyi Shi, Alexander E. Urban, Mi-Young Hong, Konrad J. Karczewski, Wolfgang Huber, Sherman M. Weissman, Mark B. Gerstein, Jan O. Korbel, and Michael Snyder. Science, 328:232–235, 2010. pdf, url (266 citations). [92] Microarray data quality control improves the detection of differentially expressed genes. Audrey Kauffmann and Wolfgang Huber. Genomics, 95:138–142, 2010. pdf, url (28 citations). [93] A large-scale RNAi screen identifies Deaf1 as a regulator of innate immune responses in Drosophila. David Kuttenkeuler, Nadege Pelte, Anan Ragab, Viola Gesellchen, Lena Schneider, Claudia Blass, Elin Axelsson, Wolfgang Huber, and Michael Boutros. Journal of Innate Immunity, 2:181–194, 2010. pdf, url (20 citations). [94] Comparison of normalization methods for Illumina BeadChip(R) HumanHT-12 v3. Ramona Schmid, Patrick Baum, Carina Ittrich, Katrin Fundel-Clemens, Wolfgang Huber, Benedikt Brors, Roland Eils, Andreas Weith, Detlev Mennerich, and Karsten Quast. BMC Genomics, 11:349, 2010. pdf, url (31 citations). [95] A global map of human gene expression. Margus Lukk, Misha Kapushesky, Janne Nikkila, Helen Parkinson, Angela Goncalves, Wolfgang Huber, Esko Ukkonen, and Alvis Brazma. Nature Biotechnology, 28:322–324, 2010. pdf, url (156 citations). P P [96] Bidirectional promoters generate pervasive transcription in yeast. Zhenyu Xu , Wu Wei , Julien Gagneur, Fabiana Perocchi, Sandra Clauder-Muenster, Jurgi Camblong, Elisa Guffanti, Francoise Stutz, Wolfgang Huber, and Lars M. Steinmetz. Nature, 457(7232):1033–1037, 2009. pdf, url (376 citations). P P [97] High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Eugenio Mancera , Richard Bourgon , Alessandro Brozzi, Wolfgang Huber, and Lars M. Steinmetz. Nature, 454(7203):479–485, 2008. pdf, url (253 citations). [98] The hwriter package. Gregoire Pau and Wolfgang Huber. The R Journal, 1(1):22–24, 2009. pdf, url. [99] Array-based genotyping in S. cerevisiae using semi-supervised clustering. Richard Bourgon, Eugenio Mancera, Alessandro Brozzi, Lars M. Steinmetz, and Wolfgang Huber. Bioinformatics, 25(8):1056–1062, 2009. pdf, url. [100] Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney, and Wolfgang Huber. Nature Protocols, 4(8):1184–1191, 2009. pdf, url (108 citations). [101] Visualisation of genomic data with the Hilbert curve. Simon Anders. Bioinformatics, 25:1231–1235, 2009. pdf, url (27 citations). [102] ShortRead: a Bioconductor package for input, quality assessment and exploration of highthroughput sequence data. Martin Morgan, Simon Anders, Michael Lawrence, Patrick Aboyoun, Hervé Pagés, and Robert Gentleman. Bioinformatics, 25:2607, 2009. pdf, url (115 citations). [103] Genome-wide allele- and strand-specific expression profiling. Julien Gagneur, Himanshu Sinha, Fabiana Perocchi, Richard Bourgon, Wolfgang Huber, and Lars M. Steinmetz. Molecular Systems Biology, 5:274, 2009. pdf, url (22 citations). [104] Quality assessment and data analysis for microRNA expression arrays. Deepayan Sarkar, R. Parkin, S. Wyman, A. Bendoraite, C. Sather, J. Delrow, A. K. Godwin, C. Drescher, Wolfgang Huber, Robert Gentleman, and Munesh Tewari. Nucleic Acids Research, 37(2), 2009. pdf, url (29 citations). [105] arrayQualityMetrics - a Bioconductor package for quality assessment of microarray data. Audrey Kauffmann, Robert Gentleman, and Wolfgang Huber. Bioinformatics, 25:415–416, 2009. pdf, url (17 citations). [106] Importing ArrayExpress datasets into R/Bioconductor. Audrey Kauffmann, Tim F. Rayner, Helen Parkinson, Misha Kapushesky, Margus Lukk, Alvis Brazma, and Wolfgang Huber. Bioinformatics, 25:2092–2094, 2009. pdf, url (17 citations). [107] Analyzing ChIP-chip data using Bioconductor. Jörn Tödling and Wolfgang Huber. PLoS Computational Biology, 4(11), 2008. pdf, url (13 citations). [108] Rintact: enabling computational analysis of molecular interaction data from the IntAct repository. Tony Chiang, Nianhua Li, Sandra Orchard, Samuel Kerrien, Henning Hermjakob, Robert Gentleman, and Wolfgang Huber. Bioinformatics, 24(8):1100–1101, 2008. pdf, url. [109] Model-based variance-stabilizing transformation for Illumina microarray data. Simon M. Lin, Pan Du, Wolfgang Huber, and Warren A. Kibbe. Nucleic Acids Res, 36(2), 2008. pdf, url (231 citations). [110] Combinatorial effects of four histone modifications in transcription and differentiation. Jenny J. Fischer, Jörn Tödling, Tammo Krüger, Markus Schüler, Wolfgang Huber, and Silke Sperling. Genomics, 91(1):41–51, 2008. pdf, url (23 citations). [111] Estimating node degree in bait-prey graphs. Denise Scholtens, Tony Chiang, Wolfgang Huber, and Robert Gentleman. Bioinformatics, 24(2):218–224, 2008. pdf, url (10 citations). [112] Florian Hahne, Wolfgang Huber, Robert Gentleman, and Seth Falcon. Bioconductor Case Studies. Use R. Springer, 2008. pdf, url (92 citations). [113] Coverage and error models of protein-protein interaction data by directed graph analysis. Tony Chiang, Denise Scholtens, Deepayan Sarkar, Robert Gentleman, and Wolfgang Huber. Genome Biology, 8(9), 2007. pdf, url (23 citations). [114] Making the most of high-throughput protein-interaction data. Robert Gentleman and Wolfgang Huber. Genome Biology, 8(10):112–112, 2007. pdf, url (24 citations). [115] Graphs in molecular biology. Wolfgang Huber, Vincent J. Carey, Li Long, Seth Falcon, and Robert Gentleman. BMC Bioinformatics, 8(Suppl. 6), 2007. pdf, url (42 citations). [116] Ringo–an R/Bioconductor package for analyzing ChIP-chip readouts. Jörn Tödling, Oleg Sklyar, Tammo Krüger, Jenny J. Fischer, Silke Sperling, and Wolfgang Huber. BMC Bioinformatics, 8:221–221, 2007. pdf, url. [117] In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. Tineke Casneuf, Yves Van de Peer, and Wolfgang Huber. BMC Bioinformatics, 8:461– 461, 2007. pdf, url (40 citations). [118] CoCo: a web application to display, store and curate ChIP-on-chip data integrated with diverse types of gene expression data. Charles Girardot, Oleg Sklyar, Sophie Grosz, Wolfgang Huber, and Eileen E. M. Furlong. Bioinformatics, 23(6):771–773, 2007. pdf, url. [119] Genomic organization of transcriptomes in mammals: Coregulation and cofunctionality. Antje Purmann, Jörn Tödling, Markus Schüler, Piero Carninci, Hans Lehrach, Yoshihide Hayashizaki, Wolfgang Huber, and Silke Sperling. Genomics, 89(5):580–587, 2007. pdf, url (32 citations). [120] High-throughput flow cytometry-based assay to identify apoptosis-inducing proteins. Mamatha Sauermann, Florian Hahne, Christian Schmidt, Meher Majety, Heiko Rosenfelder, Stephanie Bechtel, Wolfgang Huber, Annemarie Poustka, Dorit Arlt, and Stefan Wiemann. Journal of Biomolecular Screening, 12(4):510–520, 2007. pdf, url. [121] Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different functions. Stephan Steigele, Wolfgang Huber, Claudia Stocsits, Peter F. Stadler, and Kay Nieselt. BMC Biology, 5:25–25, 2007. pdf, url (22 citations). P P [122] A high-resolution map of transcription in the yeast genome. Lior David , Wolfgang Huber , Marina Granovskaia, Jörn Tödling, Curtis J. Palm, Lee Bofkin, T. Jones, Ron W. Davis, and Lars M. Steinmetz. PNAS, 103(14):5320–5325, 2006. pdf, url (393 citations). [123] Analysis of cell-based RNAi screens. Michael Boutros, Lı́gia P. Brás, and Wolfgang Huber. Genome Biology, 7(7), 2006. pdf, url (149 citations). [124] Transcript mapping with high-density oligonucleotide tiling arrays. Wolfgang Huber, Jörn Tödling, and Lars M. Steinmetz. Bioinformatics, 22(16):1963–1970, 2006. pdf, url (91 citations). [125] Statistical methods and software for the analysis of highthroughput reverse genetic assays using flow cytometry readouts. Florian Hahne, Dorit Arlt, Mamatha Sauermann, Meher Majety, Annemarie Poustka, Stefan Wiemann, and Wolfgang Huber. Genome Biology, 7(8), 2006. pdf, url (14 citations). [126] Reproducible statistical analysis in microarray profiling studies. Ulrich Mansmann, Markus Ruschhaupt, and Wolfgang Huber. Methods of Information in Medicine, 45:139–145, 2006. pdf, url. [127] The LIFEdb database in 2006. Alexander Mehrle, Heiko Rosenfelder, Ingo Schupp, Coral del Val, Dorit Arlt, Florian Hahne, Stephanie Bechtel, Jeremy Simpson, Oliver Hofmann, Winston Hide, Karl-Heinz Glatting, Wolfgang Huber, Rainer Pepperkok, Annemarie Poustka, and Stefan Wiemann. Nucleic Acids Research, 34(Database issue):415–418, 2006. pdf, url (21 citations). [128] Robert Gentleman, Florian Hahne, and Wolfgang Huber. Visualizing genomic data. Technical Report 10, Bioconductor Project Working Papers, 2006. pdf, url. [129] Image analysis for microscopy screens. Oleg Sklyar and Wolfgang Huber. R News, 6(5):12– 16, 2006. pdf, url. [130] Transcript mapping with high-density tiling arrays. Matthew Ritchie and Wolfgang Huber. R News, 6(5):23–27, 2006. pdf, url. [131] BioMart and Bioconductor: a powerful link between biological databases and microarray data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis Brazma, and Wolfgang Huber. Bioinformatics, 21:3439–3440, 2005. pdf, url (286 citations). [132] Functional profiling: from microarrays via cell-based assays to novel tumor relevant modulators of the cell cycle. Dorit Arlt, Wolfgang Huber, Urban Liebel, C. Schmidt, Meher Majety, Mamatha Sauermann, Heiko Rosenfelder, Stefanie Bechtel, Alexander Mehrle, Detlev Bannasch, Ingo Schupp, Markus Seiler, Jeremy C. Simpson, Florian Hahne, Petra Moosmayer, Markus Ruschhaupt, Birgit Guilleaume, Ruth Wellenreuther, Rainer Pepperkok, Holger Sültmann, Annemarie Poustka, and Stefan Wiemann. Cancer Research, 65(17):7733–7742, 2005. pdf, url (19 citations). [133] Systematic comparison of surface coatings for protein microarrays. Birgit Guilleaume, Andreas Buness, C. Schmidt, F. Klimek, G. Moldenhauer, Wolfgang Huber, Dorit Arlt, Ulrike Korf, Stefan Wiemann, and Annemarie Poustka. Proteomics, 5:4705–4712, 2005. pdf, url (32 citations). [134] Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis formation, and patient survival. Holger Sültmann, Anja von Heydebreck, Wolfgang Huber, Rupert Kuner, Andreas Buness, Markus Vogt, Bastian Gunawan, Martin Vingron, Laszlo Fuzesi, and Annemarie Poustka. Clinical Cancer Research, 11:646–655, 2005. pdf, url (52 citations). [135] arrayMagic: two-colour cDNA microarray quality control and preprocessing. Andreas Buness, Wolfgang Huber, Klaus Steiner, Holger Sültmann, and Annemarie Poustka. Bioinformatics, 21(4):554–556, 2005. pdf, url (36 citations). [136] Novel cancer relevant cell cycle modulators identified in automated cell-based assays. Dorit Arlt, Wolfgang Huber, Mamatha Sauermann, Meher Majety, Florian Hahne, Rainer Pepperkok, Annemarie Poustka, and Stefan Wiemann. European Journal of Cell Biology, 84(Suppl. 55):30, 2005. [137] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Bioinformatics - from Genomes to Therapies, chapter Low-level analysis of microarray experiments. Wiley-VCH, 2005. pdf. [138] On the synthesis of microarray experiments. Robert Gentleman, Markus Ruschhaupt, and Wolfgang Huber. Journal de la Société Française de Statistique, 146(1-2), 2005. pdf, url. [139] Robert Gentleman, Vincent J. Carey, Wolfgang Huber, Rafael Irizarry, and Sandrine Dudoit, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer, 2005. url (1925 citations). [140] Bioconductor: open software development for computational biology and bioinformatics. Robert C. Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Y.C. Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Günther Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y.H. Yang, and J.H. Zhang. Genome Biology, 5(10), 2004. pdf, url (5421 citations). [141] matchprobes: a Bioconductor package for the sequence-matching of microarray probe elements. Wolfgang Huber and Robert Gentleman. Bioinformatics, 20:1651–1652, 2004. pdf, url (25 citations). [142] A compendium to ensure computational reproducibility in high-dimensional classification tasks. Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann. Statistical Applications in Genetics and Molecular Biology, 3(37), 2004. pdf, url (90 citations). [143] Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use in microarray experiments. Jörg Schneider, Andreas Buness, Wolfgang Huber, Joachim Volz, Petra Kioschis, Mathias Hafner, Annemarie Poustka, and Holger Sültmann. BMC Genomics, 5(1):29, 2004. pdf, url (60 citations). [144] From ORFeome to biology: a functional genomics pipeline. Stefan Wiemann, Dorit Arlt, Wolfgang Huber, Ruth Wellenreuther, Simone Schleeger, Alexander Mehrle, Stephanie Bechtel, Mamatha Sauermann, Ulrike Korf, Rainer Pepperkok, Holger Sültmann, and Annemarie Poustka. Genome Research, 108:2136–44, 2004. pdf, url (35 citations). [145] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, chapter Error models for microarray intensities. John Wiley & Sons, 2004. pdf (12 citations). [146] Anja von Heydebreck, Wolfgang Huber, and Robert Gentleman. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, chapter Differential Expression with the Bioconductor Project. John Wiley & Sons, 2004. pdf (51 citations). [147] Multi-domain protein families and domain pairs: Comparison with known structures and a random model of domain recombination. Gordana Apic, Wolfgang Huber, and Sarah A. Teichmann. Journal of Structural and Functional Genomics, 4:67–78, 2003. pdf (76 citations). [148] Cytogenetic and morphologic typing of 58 papillary renal cell carcinomas: Evidence for a cytogenetic evolution of type 2 from type 1 tumors. Bastian Gunawan, Anja von Heydebreck, Thekla Fritsch, Wolfgang Huber, Rolf-Hermann Ringert, Gerhard Jakse, and László Füzesi. Cancer Research, 63:6200–6205, 2003. pdf, url (66 citations). [149] Mathematical tree models for cytogenetic development in solid tumors. Anja von Heydebreck, Bastian Gunawan, Wolfgang Huber, Martin Vingron, and Laszlo Füzesi. Verhandlungen der Deutschen Gesellschaft für Pathologie, 2003. [150] Parameter estimation for the calibration and variance stabilization of microarray data. Wolfgang Huber, Anja von Heydebreck, Holger Sültmann, Annemarie Poustka, and Martin Vingron. Statistical Applications in Genetics and Molecular Biology, 2(1):Article 3, 2003. pdf, url (158 citations). [151] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Analysis of microarray gene expression data. In Martin Bishop et al., editor, Handbook of Statistical Genetics. John Wiley & Sons, Ltd, Chichester, UK, 2003. pdf (53 citations). [152] Prognostic factors influencing surgical management and outcome of gastrointestinal stromal tumours. C. Langer, Bastian Gunawan, P. Schüler, Wolfgang Huber, Laszlo Füzesi, and H. Becker. British Journal of Surgery, 90:332–399, 2003. pdf, url (114 citations). [153] Transcription profiling of renal cell carcinoma. Wolfgang Huber, Judith M. Boer, Anja von Heydebreck, Bastian Gunawan, Martin Vingron, László Füzesı́, Annemarie Poustka, and Holger Sültmann. Verhandlungen der Deutschen Gesellschaft für Pathologie, 86:153–164, 2002. [154] Variance stabilization applied to microarray data calibration and to the quantification of differential expression. Wolfgang Huber, Anja von Heydebreck, Holger Sültmann, Annemarie Poustka, and Martin Vingron. Bioinformatics, 18 Suppl 1:96–104, 2002. pdf, url (1673 citations). [155] Identification and classification of differentially expressed genes in renal cell carcinoma by expression profiling on a global human 31,500-element cDNA array. Judith M. Boer, Wolfgang Huber, Holger Sültmann, Friederike Wilmer, Anja von Heydebreck, Stefan Haas, Bernhard Korn, Bastian Gunawan, Astrid Vente, Laszlo Füzesi, Martin Vingron, and Annemarie Poustka. Genome Research, 11(11):1861–1870, 2001. pdf, url (145 citations). [156] Prognostic impacts of cytogenetic findings in clear cell renal cell carcinoma: Chromosome translocation der(3)t(3;5) or gain of 5q predict a distinct clinical phenotype with favourable prognosis. Bastian Gunawan, Wolfgang Huber, Meike Holtrup, Anja von Heydebreck, Thomas Efferth, Annemarie Poustka, Rolf-Hermann Ringert, Gerhard Jakse, and László Füzesi. Cancer Research, 61:7731–7738, 2001. pdf, url (67 citations). [157] FLASHFLOOD: A 3D field-based similarity search and alignment method for flexible molecules. Michael C. Pitman, Wolfgang Huber, Hans Horn, Andreas Krämer, Julia E. Rice, and William C. Swope. Journal of Computer-Aided Molecular Design, 15:587–612, 2001. pdf, url (18 citations). [158] Identifying splits with clear separation: A new class discovery method for gene expression data. Anja von Heydebreck, Wolfgang Huber, Annemarie Poustka, and Martin Vingron. Bioinformatics, 17 Suppl. 1:S107–114, 2001. pdf, url (77 citations). [159] Gene expression profiling of kidney cancer using a tumor-specific cDNA microarray. Holger Sültmann, Wolfgang Huber, Laszlo Fuzesi, Bastian Gunawan, Anja von Heydebreck, Martin Vingron, and Annemarie Poustka. Clinical Cancer Research, 7(11, Suppl. S):155, 2001. pdf, url. [160] Quasistationary distributions of dissipative nonlinear quantum oscillators in strong periodic driving fields. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physical Review E, 61:4883–4889, 2000. pdf, url (26 citations). [161] Stochastic wave function method versus density matrix: a numerical comparison. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Computer Physics Communications, 104:46–58, 1997. pdf, url (16 citations). [162] Vestibular-neck interaction and transformation of sensory coordinates. Thomas Mergner, Wolfgang Huber, and Wolfgang Becker. Journal of Vestibular Research, 7:347–367, 1997. (100 citations). [163] Spatially resolved measurement and modeling of blood brain barrier permeability. Wolfgang Huber, Klaus Kopitzki, Jens Timmer, and Peter Warnke. Biomedizinische Technik, 41 suppl. 1:160, 1996. pdf. [164] Fast Monte Carlo algorithm for nonequilibrium systems. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physical Review E, 53:4232–4235, 1996. pdf, url. [165] The three-loop model: a neural network for the generation of saccadic reaction times. Burkhart Fischer, Stefan Gezeck, and Wolfgang Huber. Biological Cybernetics, 72:185–196, 1995. pdf, url (27 citations). [166] The macroscopic limit in a stochastic reaction–diffusion process. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Europhysics Letters, 30:69–74, 1995. pdf, url (26 citations). [167] Fluctuation effects on wave propagation in a reaction–diffusion process. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physica D, 73:259–273, 1994. pdf, url (49 citations). [168] Wolfgang Huber. Dynamics of strongly driven open quantum systems. PhD thesis, University of Freiburg, 1998. pdf. [169] Wolfgang Huber. The description of reaction diffusion processes by master equations. Diploma thesis, University of Freiburg, 1994. pdf.