Research_Summary_2016 - Huber Group

Transcription

Research_Summary_2016 - Huber Group
Research Summary - Spring 2016
Wolfgang Huber
Resarch Group Leader and Senior Scientist
Contents
A Research Vision
2
B Summary of Work
2
C Future Plans
10
D Publications 2012-16
15
E List of External Grants (since 2012)
22
F Curriculum Vitae
23
A
Research Vision
The unifying concept of my research is methodology: statistical expertise and the ability to invent
new methods. I apply these where there is a gap whose overcoming will progress biology. I run an
interdisciplinary group with three main aims:
The first aim is to drive forward the state of the art of statistics in biology – that is, the science
of reasoning with uncertainty, making reliable inference based on incomplete, noisy or overwhelming
data. But I also understand statistics as an instrument for discovery: a set of tools that help humans see
interesting patterns in large datasets.
A second aim is to gain insight into pressing questions in drug-genotype interactions and precision
oncology through proficient use of statistical computing.
To achieve both of these aims, I closely collaborate with biomedical researchers who are equipped
with exciting novel technologies and are producing novel data types.
Thirdly, I aim to advance translational statistics by making methods usable not only for experts, but
for a wide range of users. This aim is embodied by my engagement in the Bioconductor project.
B
Summary of Work
Research Highlights of the Last Four Years
From 2012 till present, 22 papers were published with W. Huber as corresponding author and/or group
members as (co-)first authors [1–18, 49,50,56,66]. There were 67 papers altogether (Section D). Several
are having an impact. Highlights are:
Statistical methods. We developed the first approach to false discovery rates in multiple testing that
permits data-driven hypothesis weighting [68]. The power gains can be large, and the method is broadly
applicable.
RNA-seq. We developed what have become standard tools for RNA-seq analysis, most prominently
DESeq2 for differential gene expression analysis [8, 10]. Moreover, we published htseq, DEXSeq [18]
and a method for single-cell RNA-seq [49]. We used DEXSeq for a novel contribution to the debate
on ’junk’ versus ’function’ in alternate RNA isoforms [15]. We used statistical modelling to weigh the
extent of stochasticity and regularity in the promiscuous gene expression of medullary thymic epithelial
cells [4].
Other ’omics. We developed methods and Bioconductor packages for cancer genome sequencing [6,
12], 4C-seq [7] and iCLIP [25] and applied these in numerous collaborative projects. We contributed to
the adaptation of the DESeq2 framework to other data types, such as Ribo-seq and ChIP-seq.
Translational statistics. Many powerful mathematical and computational methods exist but are difficult to access for a majority of biomedical scientists. We translate advanced ideas into practical methods and software. I took responsibility for the European presence of Bioconductor [1], a widely used
bioinformatics software project, through organising developer conferences, annual summer courses and
obtaining EC network grants (RADIANT, SOUND) for the project.
Gene-gene & gene-drug interactions. We discovered a method for automated inference of the direction of epistatic genetic interactions from high-content phenotyping data [2]. We partnered with the
National Centre for Tumour Diseases (NCT) to translate high-dimensional phenotyping and gene-drug
interaction screening into practical personalized medicine.
Systems microscopy. We developed methods for estimating quantitative biophysical models from timeresolved microscopy data and applied these in several successful collaborations with developmental
biologists [5, 24, 48, 56].
Thermal proteome profiling. We recently started work on statistical methodology and computational
infrastructure for thermal proteome profiling [9, 27]. Our aim is to make the technology as widely accessible and usable as possible – a new ’workhorse’ for scientists both in fundamental and pharmaceutical
research.
Reproducible research. All our major papers are accompanied by a complete transcript of all computations from raw data to figures, tables and numbers reported in the paper.
Further details on a selection of the above-mentioned highlights are given in the following.
B.1
Translational Statistics
The adjective translational is sometimes used for efforts to translate biological discoveries into something useful for medicine. I use the term translational statistics for efforts to make sophisticated mathematical discoveries and computational methods accessible to a wide range of natural scientists.
I have contributed to the Bioconductor project since 2002 [1, 8, 53, 112, 139, 140]. The project
has been providing an energetic, fast-moving platform to the research community for collaborative,
interoperable, scientifically leading software in genomics and quantitative biology. It has also become
a platform for the publication of bioinformatic software that many authors aspire to. Bioconductor is
the largest software project in bioinformatics, with several thousand users and hundreds of developers
worldwide. It comprises more than 1000 software packages. I have outlined the aims of the project, and
the means by which we achieve them, in a recent perspective paper [1].
My particular role in the project has been in the provision of mathematically sophisticated packages
for the primary data analysis of popular technologies [6–10, 12, 18, 50, 68, 71, 82, 88, 104, 105, 109, 124,
154]. For some of them it might be fair to say that they were among the “killer applications” that helped
bring new users to Bioconductor. A list of those software packages is provided in Section F, heading
Software.
An important goal has been to facilitate interoperability of R/Bioconductor with other software
projects. For instance, the rhdf5 package provides an interface to the HDF5 data storage system. HDF5
is used in high-performance computing and permits efficient exchange of large, array-shaped datasets
between different software systems. The RBioFormats package provides an interface to BioFormats1 ,
the leading solution for reading vendor-specific microscopy image data and metadata formats. The
package lpsymphony interfaces to the powerful SYMPHONY optimisation package, an open-source
solver for mixed-integer linear programmes.
Since 2005, an annual general Bioconductor conference has been held in the US each summer. Since
2010, I have coordinated annual European developer conferences, which take place in the winter and
alternate between the UK and the continent (Heidelberg, Zurich). They usually attract 40-50 active and
future package developers. For new users, I organise the annual CSAMA summer schools in Brixen,
South Tyrol, which have taken place every year since 2004. These week-long compact courses host
around 60 participants (places are usually booked out quickly) and are taught by high-calibre teachers,
incl. R. Gentleman, M. Morgan, V. Carey, M. Love, S. Anders. Since 2015, I have been involved
in the organisation of bi-annual Statistics in Genomics workshops at the ETH’s wonderful conference
centre on Monte Verità, Ascona, Switzerland. To support Bioconductor development, I have co-written
the EC network grant RADIANT (2012-2015) and am coordinating the SOUND project (2015-2018).
These grants include leading contributors to Bioconductor in Europe. They do not only provide research
funding, but also positions for staff to work on strategically important infrastructure- or support-oriented
tasks. SOUND also includes a US partner, M. Morgan, the leader of the Bioconductor project.
B.2
DESeq, DESeq2 and DEXSeq
DESeq is a method and software package for the differential analysis of count data from high-throughput
sequencing that we published in 2010 [82]. In the meanwhile, it has been cited over 2,400 times2 . With
DESeq2 , we have greatly extended its statistical sophistication (Figure 1) and the range of its applications, and improved the software user interface and robustness, documentation and associated training
material [10]. The method is based on generalized linear models and uses empirical Bayes methodology to permit model parameter estimation even in the case of few (e. g., two) replicates. Contrary to
some misperceptions, using only such a ’small’ number of replicates is a reasonable, scientifically and
economic efficient choice for designed experiments3 . It is supported by progress in statistical modelling
1
http://www.openmicroscopy.org/site/products/bio-formats
ISI Web of Science
3
It is helpful to distinguish between designed experiments, performed under well-controlled laboratory conditions, and
studies, done with cohorts ’in the wild’, e. g., human subjects in the clinic. For the latter, large cohort sizes (100s, 1000s)
2
Figure 1: DESeq2 uses empirical Bayes
methodology to obtain stable estimates of
logarithmic fold changes (LFC) and variances even when the number of replicates
is small. In this figure from reference [10],
panels A and B show M A-plots of the
maximum likelihood (ML, A) and maximum a posteriori (MAP, B) estimates of
LFC. Two genes with similar mean count
and MLE LFC are highlighted by green
and purple circles, and their normalized
count data are shown in panel C. The green
gene has low dispersion, the purple gene,
high dispersion. Panel D shows the densities of the likelihoods (solid lines), posteriors (dashed) and the Empirical Bayes prior
(solid black).
–in particular, empirical Bayes methodology and hierarchical models that share information between
genes– over the last 15 years. Since its publication in December 2014, the DESeq2 paper [10] has been
cited over 110 times4 , and the package was downloaded from >35,000 unique IP addresses over the last
year.
DESeq2 is an example for relatively sophisticated statistical methodology that makes a practical
difference to biologists.
Moreover, we also published htseq and htseq-count for counting the overlap of aligned sequencing
reads with genomic features. This is a basic step in the processing of RNA-seq data. The paper associated with the software [11] has been cited over 300 times and is mentioned in >1,000 papers according
to the full-text search of PubmedCentral5 . It is the most prominent implementation of the ’counting’
approach to RNA-seq6 .
DEXSeq [18] addresses alternative isoforms. In comparison to approaches that try to reconstruct
full transcripts before testing them for differential abundance across conditions, DEXSeq short-circuits
the assembly and looks for differential exon usage directly. It has performed well in recent benchmarks7
compared to the aforementioned approaches – a result of the fact that the goal of full mammalian
transcript reconstruction from Illumina HiSeq short reads remains elusive8 .
Drift and conservation of differential exon usage across tissues in primate species. Using
DEXSeq on multi-species, multi-tissue data, we have made a contribution to the discussion of ’junk’
versus ’function’ in alternate RNA isoforms [15]. We found that for a large fraction of tissue-specific
isoform diversity seen in primates, the tissue-specific expression is not conserved even between closely
related species. On the other hand, for the subset of highly expressed tissue-specific isoforms (3,800
exons in 1,643 genes), we do detect conserved tissue-specific usage across species. To the extent that
such conservation is an indicator of selection for function, our analysis supports the view that, by and
are needed, and analysis methods can be less reliant on the limma / edgeR / DESeq2 - style empirical Bayes approach to
information sharing across genes.
4
More than 450 if references to the bioRχiv preprint are included.
5
http://www.ncbi.nlm.nih.gov/pmc/?term=htseq
6
More recently, methods that circumvent the alignment and feature-counting steps by directly assiging reads to target
sequences via k-mer matching, such as sailfish, are gaining traction. Eventually, this approach is likely to make the use cases
for htseq-count less numerous – but not those for differential expression analysis, i. e., DESeq2 or its related methods.
7
E. g., Soneson et al. (2015) http://dx.doi.org/10.1101/025387
8
Steijger et al. (2013) http://dx.doi.org/10.1038/nmeth.2714
large, alternative isoform usage is leaky and noisy at low abundance levels, but more tighly controlled
and functional for higher abundance transcripts.
For single cell RNA-seq data, Simon Anders published an influential method for distinguishing
true biological variability from technical variability [49]. We used it to resolve a debate on the extent
of stochasticity and regularity in the promiscuous gene expression programmes of medullary thymic
epithelial cells [4]. Also with the Steinmetz lab, we mapped cell-to-cell variability of 3’ isoform choice
by single-cell polyadenylation site mapping [32].
Extension to other data types. A special highlight here was that the very first high-throughput
CRISPR/Cas9 screen9 was analysed with DESeq2 . The fact that this was done without our direct
involvement speaks for the usability of the software. As for our own efforts, we focused on highthroughput chromosome conformation capture assays, specifically 4C, HiC and ChIA-PET. We developed the Bioconductor package FourCSeq [7], applied it to research reported in Nature [42] and
presented further results on analysis of HiC data in [69].
Documentation and usability. We published an end-to-end RNA-seq data analysis protocol oriented to practitioners in Nature Protocols [50]. This was written as a consensus document together
with the authors of the main competing package, edgeR . Two years later, we provided an updated and
distinctly extended version in F1000Research [8].
B.3
Cancer Genomics
We developed the h5vc package, which leverages the high-performance data storage system HDF5
together with R/Bioconductor for large-scale analyses of genome sequencing data [12]. We also published the SomaticSignatures package, which identifies mutational signatures of single nucleotide variants (SNVs) in tumour genomes [6]. It provides infrastructure related to the methodology described by
Nik-Zainal (2012, Cell). We applied these tools in numerous collaborative projects, including the HeLa
genome [17], the first data-based estimation of position-specific error rates for each base in the human
genome10 .
B.4
Multiple Testing, False Discovery Rates and Hypothesis Weighting
When functional genomics data became available in the 1990s, a spike of interest arose in the topic
of multiple testing. With the adoption of the false discovery rate (FDR) as a common experimentwide summary and with practical computational methods11 , it seemed for a while that the topic was
settled. However, as the size and complexity of datasets have increased, researchers have realized a
major limitation of the currently used FDR methods: the exchangeability assumption. The information
used from each hypothesis test is only the p-values. Other potentially useful information –such as the
power of the test, the observed effect size, the prior probability of the null hypothesis– is effectively
ignored. Although various ad hoc fixes and heuristics existed, they were unsatisfactory since they were
statistically inefficient, required manual ad hoc tuning, or were even fallacious. Our work provides a
principled, data-driven and statistically near-optimal solution to the problem [68]. It generalizes earlier
work [10, 81].
B.5
Gene-Gene and Gene-Drug Interactions
Automated phenotyping from microscopy image analysis. Microscopy-based readouts are more
informative for phenotyping than bulk viability or reporter assays, by providing single-cell resolved
data on processes such as cell cycle and proliferation, cell migration, trafficking and organelle morphology. We have created an R-based infrastructure –in particular our Bioconductor package EBImage– to support such high-throughput workflows, and have applied it widely in successful collaborations [2,3,5,14, 24,41,44,48,52,54,56,65,67]. In comparison to other tools (e. g., CellProfiler, Matlab,
ImageJ/Fiji), strengths of our solution lies in the combination of functionality, speed and scriptability.
9
Zhou et al. (2014) http://dx.doi.org/10.1038/nature13166
Julian Gehring’s PhD thesis; paper to be published
11
Most prominently, the method of Benjamini and Hochberg.
10
Published online: December 23, 2015
Molecular Systems Biology
A
Integrated phenotypic and pharmacogenetic compound profiling
C14
YC-1
ARP 101
Cantharidic acid
C15
Cantharidin
BIO
low
high
Disulfiram
C18
ZPCK
Tyrphostin AG 555
CAPE
Betamethasone
C4
Beclomethasone
U0126 (control)
C2
5'dFUrd
C6
U0126
PD98059
5-FU
similarity of
multiparametric interaction profiles
1
C9
Carboplatin
CB 1954
DMAT
TBBz
Genotypes
B
C12
C11
BAY 11-7082
BAY 11-7085
STATTIC
C13
C10
Figure 2:
Unsupervised clustering of
drugs based on the correlation of their
imaging-based high-content phenotypes in
12 different cell lines [3]. The correlation distances between each pair of compounds are shown in the upper left half of
the matrix. For comparison, the lower right
shows the structural similarities (Tanimoto
distances).
C1
Taxol (control)
Taxol
Podophyllotoxin
Colchicine
Vinblastine
Vincristine
Vinblastine (control)
CHM-1 hydrate
Nocodazole
Multiparametric
C17
Ouabain
Dihydro-Ouabain
Brefeldin A
Bendamustine
Iodoacetamide
Pifithrin-mu
Parthenolide
Supercinnamaldehyde
C3
C5
Rottlerin
Niclosamide
C16
C8
Mitoxantrone
Camptothecin
Thapsigargin
Calcimycin
CGP-74514A
Emetine
NSC95397
Phenanthroline
5-Azacytidine
Aminopterin
Methotrexate
PD 169316
SB 202190
2
C7
Etoposide
Amsacrine
NU2058
Ara-C
Cyclo-C
Marco Breinig et al
structural similarity
of compounds
low
Genotypes and
high
C
ot
yp
es
tip
phara
ar
m
am Ge eno et
n
t
et o
y ric
ric ty pe
ph pes s
en a
ot nd
yp
es
m
ul
tip
M
ul
G
en
ECDF
∆ AUC
multiparametric
0.3 Target
phenotypes
In 1.0
terms of functionality,
we leverage
R’sphenotypes
rich toolset
forselectivity
statistics, machine learning and publicationquality
data visualisation.
0.8
0.2
0.6
Another
output
of
general
interest
is
our
new
feature selection method [2]. It combines attractive
0.4
0.1
0.2
properties of linear rotation methods (such as principal component analysis, linear discriminant anal0
0
ysis),
namely,
non-redundancy
and signal-to-noise
based dimension selection with the advantages of
0
0.5
1.0
–1.0 –0.5
0
0.5
1.0 –1.0 –0.5
0
0.5
1.0 –1.0 –0.5
Correlation
between
compound profiles
feature selection,
namely,
interpretability
and portability.
no shared target selectivity
shared target selectivity
We performed the first gene-gene interaction screen by combinatorial RNAi in human cells [14, 44].
We demonstrated the power of genetically engineered cell lines and high-content phenotyping for discovering drug-gene interactions (Figure 2 [3]).
We invented a new method for deducing directionality in gene-gene interaction data (Figure 3).
10
The inferred directed arrows can often be related to temporal, logical, or causal hierarchy of the targeted
gene products [2]. The method is applicable to multivariate phenotypes, and in particular to features
from high-content screening. Besides gene-gene interactions, it will also be applicable to gene-drug or
drug-drug interactions.
We are currently pushing forward this line of work from laboratory cell lines to large cohorts of
primary cancer cells, in an exciting collaboration with haematologists at the National Centre for Tumour
Diseases (Figure 4).
Figure 5.
Molecular Systems Biology 11: 846 | 2015
B.6
ª 2015 The Authors
Reproducible research
We have established a system of supplementary information that we use for all our major papers. It
allows readers to fully reproduce the reported results from raw data to all figures, tables and numbers.
We provide these packages for the free, open-source R system, most of them hosted on Bioconductor12 .
The packages contain the raw data files, custom-written procedures incl. standard R-style documentation
in manual pages and literal programming documents. These are documents authored with the knitr
system that mix computer code and human-readable narrative and are executable by anyone. In this
way, readers can not only reproduce what we did, but also check the effect of variations of our analysis
choices on the results. Moreover, they may take our methods and adapt them to their data.
Besides the direct utility of this information, our aim is also to demonstrate across a range of journals and communities that it is possible to move beyond supplementary information in static PDF files
to support a paper. These include:
12
https://bioconductor.org
Topic
Single cell transcriptome analysis in the early mouse
embryo
Life-cell microscopy study of cell migration in the
fish embryo
First comprehensive RNA interactome
Map of genetic interactions in human cancer cells
with RNAi and multiparametric phenotyping
Large-scale directional genetic interaction map in fly
Mapping of signalling networks through synthetic
genetic interaction analysis by RNAi
Chemicalgenetic interaction map of small molecules
using highthroughput imaging in cancer cells
Single Cell RNA-Seq
Protein turnover in embryos based on tandem fluorescent timer microscopy
RNA-Seq analysis end-to-end workflow
RNA-Seq analysis method
Dynamical modelling of cell cycle phenotypes from
genome-wide RNAi live-cell imaging
Drift and conservation of differential exon usage
across tissues in primate species
Differential exon usage from RNA-Seq method
Furrow segmentation in life imaging of optogenetic
experiment
Mutliple testing methods paper
Journal
Nature Cell Biology [40]
Package/URL
Hiiragi2013
Nature [48]
DonaPLLP2013
Cell [66]
Nature Methods [14]
Website
HD2013SGI
eLife [2]
Nature Methods [71]
DmelSGI
RNAinteractMAPK
Mol. Syst. Biol. [3]
PGPC
Nature Immunology [4]
Development [5]
Single.mTEC.Transcriptomes
TimerQuant
F1000 Research [8]
Genome Biology [10]
BMC Bioinformatics [16]
Webpage
DESeq2 , Webpage
mitoODEdata
PNAS [15]
PDF vignette
Genome Research [18]
Developmental Cell [24]
DEXSEq, pasilla
furrowSeg
bioRχiv [68]
github
Genes and chromosomes | Genomics and evolutionary biology
Cdc23→sti
Figure 4. Deriving directional genetic interactions. (A) Multiparametric phenotypes are extracted for single a
scores are computed for each double knockdown experiment. The schematic plots in the third column show
types were computed from images of cells treated with combinatorial libraries of single and double
interactions between gene A and gene B using two exemplary phenotypes. The single knockdown phenoty
RNAi knockdowns [2].
Each phenotype was represented as an n-dimensional vector; the origin of
knockdown
depicted
as arrows.
expected here
double knockdown phenotype for
the vector space wasdouble
fixed such
that thephenotypes
null vector (AB)
is theare
negative
control.
For The
visualisation,
of
the
single
gene
effects,
is
depicted
by
the
symbol
NI.
Black
arrows
depict
the genetic interaction π. The fir
n = 2: cell number and area of nuclei. In [2], we used n = 21. It turns out that in many cases
are
not
interacting.
Below,
four
types
of
interaction
between
the
genes
A
and
B are shown: gene A is alleviati
the double knockdown phenotype vector of two genes A and B is approximately collinear with
B;
and
in
reverse,
B
alleviates
or
aggravates
gene
A.
Whenever
the
genetic
interaction
(black arrows) is paral
that of one the two genes, but is either increased or decreased. These four scenarios are depicted
effects,
a
directional
genetic
interaction
is
called.
(B–D)
A
directional
interaction
detected
between Cdc23 and
schematically on the left. The middle and right panels show data for two exemplary genes, sti and
show
the
phenotypes
(nuclei
area
and
cell
number)
of
the
two
dsRNAs
designed
for
sti
and
Cdc23. The grey
Cdc23, for four replicate experiments. The data are best fit by model B→A, indicating that loss of
effect
for
the
two
genes.
The
black
arrows,
indicating
the
genetic
interaction,
are
directed
opposite to the
function of Cdc23 reverts the phenotype of sti. Biologically, this is explained by the fact that the
Figure
4.
continued
on
next
page
cytokinesis regulator sti acts chronologically after the APC/C member Cdc23 in mitosis. In Fig. 5
Figure 3: Data-based inference of directional epistatic genetic interactions. Multivariate pheno-
of the paper [2] we showed how to derive a dense network of such directional epistatic interactions
for mitosis-relevant genes. Note: the images shown here represent only a small zoom-in view of the
images analysed. Fischer et al. eLife 2015;4:e05464. DOI: 10.7554/eLife.05464
rametric phenotypes are extracted for single and double knockdowns. Genetic interaction
The schematic plots in the third column show the model for identifying directional genetic
y phenotypes. The single knockdown phenotypes of genes A and B and the measured
he expected double knockdown phenotype for non-interacting (NI) genes, which is the sum
arrows depict the genetic interaction π. The first row shows the case where genes A and B
e genes A and B are shown: gene A is alleviating to gene B, gene A is aggravating to gene
r the genetic interaction (black arrows) is parallel or anti- parallel to one of the single gene
ional interaction detected between Cdc23 and sti. (C) The two orange and two blue arrows
dsRNAs designed for sti and Cdc23. The grey dots show the expected double knockdown
etic interaction, are directed opposite to the phenotype of sti, indicating that functional
9 of 21
Lars Steinmetz
EMBL
Michael Boutros
DKFZ
Martin Morgan
Thorsten Zenz
RPCI (Buffalo,
USA)
NCT, DKFZ
Jan Korbel
Mikhail Savitski
EMBL
EMBL
Eileen Furlong
Jeroen Krijgsveld
EMBL
EMBL
Susan Holmes
Jan Ellenberg
Darren Gilmour
Stanford
EMBL
EMBL
Stefano de Renzis
Takashi Hiiragi
Michael Knop
EMBL
EMBL
Heidelberg
Andreas Trumpp
Matthias Hentze
DKFZ
EMBL
Alvis Brazma
EBI
Gitte Neubauer,
Gerard Drewes
Cellzome /
GSK
Judith Zaugg
Peer Bork
EMBL
EMBL
Transcriptomics, systems genetics [4,15,17, 21,32,45,49,51,58,
69, 72, 74, 90, 96, 97, 99, 103, 122, 124]
Gene-gene and gene-drug interactions, high-content phenotyping [2, 3, 14, 44, 71, 78, 79, 83, 84, 93, 123]
Bioconductor – software for genome-scale data analysis [1, 53,
102]. Funding: BIGDATA, SOUND
Cancer pharmacogenomics [28, 31, 70]. Funding: SOUND,
TRANSCAN GCH-CLL
Cancer genomics [17, 80, 91]. Funding: BioTop, HD-HuB
Thermal proteome profiling – statistical method development [
9, 27]
4C data analysis [7, 42, 118]
Mass spectrometry based quantitative proteomics [13, 26,43,59,
61, 64, 66]
Statistical methods for high-throughput biology
Systems microscopy [16, 86,87]. Funding: Systems Microscopy
Quantitative modelling from live cell imaging of cell migration
[48, 5]
Optogenetic study of tissue morphogenesis [24]
Single cell transcriptomics [40]
Quantitative modelling of microscopy data for protein turnover
[41, 67]
RNA-seq data analysis [13, 20, 43]
RNA interactome – statistical method development [25, 60, 61,
63, 66]. Funding: joint EIPOD
Quantitative methods for RNA-seq; imaging bioinformatics [29,
46, 75, 95, 105, 106, 131]. Funding: Systems Microscopy
Thermal proteome profiling, high-content phenotyping and
multi-omics [9, 27]. Funding: GSK postdoc fellowship; joint
EIPOD
eQTL analysis – statistical method development [30, 68]
Bioinformatics pipelines, statistical methods [76]. Funding:
HD-HuB
Table 1: Overview of collaborations. Resulting publications and joint research grants (see also Section E) are stated where available.
C
Future Plans
Biostatistics for the 21st Century
The ultimate goal of my research is the successful application of multi-omics and computational reasoning to personalised health and medicine. My distinctive mark will be the combination of statistical
methods innovation and practical application to leading-edge experiments or studies. I will continue
to search out collaborations with biotechnology developers and biomedical researchers. I also plan to
invest in the immersion of physician-scientists into genomic big data analysis.
In terms of data types, the leading themes will be:
• New technologies in nucleotide sequencing, proteomics, imaging, real-time monitoring
• Pervasive longitudinal multi-omic data
• Single-cell resolution for ever more assays
• High-throughput genetics and precision oncology
In terms of methods:
• Data heterogeneity, data missing not at random and other biases
• Structured learning
• Translational statistics
C.1
New Technologies
I aim to create innovative computational algorithms to mine the big and complex data that arise as part
of developing new biotechnologies and applying them to novel areas of biology. Successful examples
include microarrays [112,139,154], tiling arrays [122,124], collaborative statistical computing [1, 139],
RNAi [14, 71, 123], RNA-seq [8, 10, 11, 18, 50, 82], 4C [7, 42], iCLIP [25], single-cell RNA-seq [4, 49],
high-content phenotyping [2, 3, 54, 83, 84], iTRAQ [88], thermal proteome profiling [9, 27]. Current
foci are:
• Thermal proteome profiling and other applications of quantitative mass spectrometry
– Data-driven biophysical modelling of melting curves
– Rich multiparametric hierarchical models and (empirical) Bayes methods to make them
identifiable from data
• Single cell sequencing
– Dimension reduction, detection and quantitative modelling of underlying structures: trajectories, gradients, bifurcation points, Waddington landscapes
– Integrating multiple layers of data (e. g., DNA, transposase-accessible chromatin, RNA)
• Imaging-based phenotyping of tumour models
• High-throughput genetics
I have always been keen to spot opportunities that might arise from early access to exciting new
data types. Potential fields of future engagement are imaging (high-throughput super-resolution microscopy for spatially resolved single-cell ‘omics), microfluidics, high-throughput synthetic biology
(e. g., CRISPR), third-generation sequencing.
C.1.1
Pervasive Longitudinal Multi-Omic Data
Humans are now the best-studied model organism. There are 7 billion individuals to be genotyped and
phenotyped. There is a potential for extremely rich phenotypes, as the costs do not need to be born by
research budgets. We can use data from clinics, which among other things are large phenotyping centres
funded by health systems13 . Moreover, wearable devices and the Internet of Things are emerging. They
will provide rich data on life-styles and physiological parameters also from healthy humans.
‘Omic datasets of the past were from single time points, were picked together from ad hoc cohorts,
had small sample sizes and used a single technology (e. g., microarrays). In contrast, datasets of the
13
In 2013, 17.1% of the GDP of the USA was spent on health care, compared to 2.8% for research and development (incl.
all sectors, not only health). Source: The World Bank, http://wdi.worldbank.org/table/2.15 and http://wdi.worldbank.org/
table/5.13
future will be pervasive (large cohorts, commoditized technologies), will be assayed at many time points
during healthy life and disease, and use multiple ‘omic technologies to cover the range of relevant
biology.
Taken together, these developments will allow us to drive forward personalized medicine –the
use of ‘omics and systems biology in evidence-based medicine– and personalized health – managing
healthy life using new technologies (cf. the conference I co-organised, Section F).
To help address associated challenges, I have assembled the international research network SOUND14 .
SOUND is funded by the European Commission within its Horizon 2020 Research and Innovation
programme “Personalising Health and Care” and runs from 9/2015 to 8/2018. The partners comprise bioinformatician-statisticians and physician-scientists from leading institutions in personalized
medicine including NCT and EMBL Heidelberg, ETH and University Hospital Zurich, TU Munich,
IDMEC Lisbon, BDD in The Hague and the Roswell Park Cancer Institute (USA). The objective of
SOUND is to create the bioinformatic tools for statistically informed use of personal ’omic data in
medicine, including cancers and rare metabolic diseases. Its partners have a strong track record and
future commitment to Bioconductor (see Section C.3.3). Bioconductor has been exceedingly successful
in enabling researchers to analyse the ‘omic datasets of the past, and the aim of SOUND is to help move
forward Bioconductor to enable physician-scientists and biological researchers to effectively mine the
pervasive longitudinal multi-omic data of the future.
C.1.2
Single-Cell Resolution for Ever More Data Types
Many molecular biology technologies were developed to work on bulk samples, i. e., on populations of
millions of cells and billions of molecules. These numbers are coming down. In 2015, single-cell RNA
sequencing for tens of thousands of cells (drop-seq) and the parallel sequencing of the same single
cell’s RNA and DNA-methylation status were reported15 . Other assays (e. g., transposase-accessible
chromatin, ATAC-seq) are sure to follow. New developments in chemical biology, fluorescent probes
and super-resolution microscopy are beginning to enable the spatial localization and quantification of
specific RNA (and DNA) sequences at single molecule resolution. For the statistician, these data offer
exciting opportunities:
Error modeling – the technologies will have imperfect sensitivities and specificities. False positives
and false negatives will not occur randomly, but often depend on biophysical biases (e. g., sequence,
internal state, environment) that need to be discovered, quantitatively modelled and estimated.
Signal processing – there is a need for designing clever codes (e. g., molecular barcodes) and to later
deconvolute them, possibly in complex combinatorial ways and in the presence of error; see, e. g., the
work by Xiaowei Zhuang’s lab on spatially resolved multiplexed RNA profiling in single cells16 .
Beyond averages – we will get variances and indeed full distributions, which need to be accurately and
robustly estimated, and compared between each other (e. g., between cells with and without a stimulus).
Patterns – what is noise, what is systematic behaviour? Variations that cancel out on average may or
may not be actively regulated and systematic within single cells, and reveal important mechanisms. We
addressed an instance of this question in [4]. Other examples are fluctuations in protein abundance that
might be correlated by processes ensuring stoichiometry of operational units, or cellular localisation.
C.2
Application Areas: High-Throughput Genetics and Precision Oncology
This line of research is a continuation of our successful work on gene-gene and gene-drug interactions
(Section B.5). I plan to conduct it with primarily two strong, cross-fertilizing collaborations, one with a
technology and cell line model focus and one with a translational and clinical focus.
14
http://www.sound-biomed.eu
Angermueller et al. (2016) http://dx.doi.org/10.1038/nmeth.3728
16
Chen et al. (2015) http://dx.doi.org/10.1126/science.aaa6090
15
10
0
10
0
10
0
0
0
BTK ibrutinib
BTK ibrutinib
MEK
MEK
nib
eti
se
80
80
ME
K
20
40
60
80
MTOR
0
10
20
40
60
80
MTOR BTK
0
10
MTOR BTK
10
10
0
20
0
60
60
40
lum
nib
eti
lum
se
10
0
40
us
im
us
im
l
ero
l
ero
20
ev
ev
us
im
40
60
R
R
60
O
MT
O
MT
rol
ME
K
80
40
80
20
100
20
100
80
SYK
20
40
60
80
BTK
10
SYK
20
40
60
80
BTK
10
SYK
BTK ibrutinib
BTK ibrutinib
Figure 4: Pharmacogenomics of drug sensitivity. The position of each point in the ternary plots
shows the relative response of a patient-derived primary chronic lymphocytic leukaemia (CLL) sample to each of three drugs (ibrutinib, everolimus, selumetinib) that specifically target three different
signalling kinases (BTK, MEK, MTOR). The circle size represents the average response of the
sample to all three drugs. The plot highlights pathway-specific dependency distributions. While the
majority of CLL with unmutated IGHV locus (left panel) depend about equally strongly on BTK
and MEK activity, the distribution in CLL with mutated IGHV locus (right panel) is more dispersed
and shows a subgroup that respond to MTOR inhibition and less to the other inhibitors.
C.3
C.3.1
Fundamental Problems in Statistics
Data Heterogeneity, Data Missing Not at Random, and Biased Sampling
The data heterogeneity challenge in multi-omics derives from the fact that for different ‘omic layers,
different types of features are interrogated. DNA-related data are reported in chromosomal coordinate
systems. The central dogma links that to RNA- and protein-related data, but the mapping can become arbitrarily complicated due to splicing, paralogy, post-transcriptional and post-translational modifications.
Moreover, these processes may themselves be affected by the treatment of interest or differ between individuals. For metabolites and drugs, the link to the other coordinate systems is even less well defined.
Moreover, even though in the simplest case all levels of multi-omic data are measured simultaneously on
the exact same samples, in practice they may be taken at more or less different body sites or with more or
less time between them. Altogether, this means that while ’old’ omic data can be conveniently modelled
by a 2D matrix (features × samples), multi-omic data are more complex than adding a 3rd dimension
to the matrix: the mappings between features and samples at different levels are fiddly, dynamic and
uncertain. We work on concepts, algorithms and software to address such challenges.
Sampling is at the basis of much of statistics: voter polls are made not by asking everyone who
will vote, but from a sufficiently large and representative sample. Similarly, in RNA-seq or ChIP-seq
we do not sequence every DNA molecule that is theoretically available. Complications start when the
sampling is biased. If the bias is precisely known, one can try to adjust for it. But in most cases,
detecting, modelling and quantifying the important biases is part of the analyst’s task. Furthermore, she
can feed such observations back ’upstream’ to improve technologies and experimental designs.
A related problem is data missing not at random. For instance, in single-cell RNA-seq, some genes
may go undetected and unreported due to low abundance, but the probability of such drop-out events
may depend on biochemical and biophysical factors in complex ways.
All of these challenges require deep engagement with the data, good mechanistic understanding of
the data generating biology and technologies, but also of the downstream inferential expectations and,
not least, mastery of statistical tools inclunding visualisation and regression modelling.
C.3.2
The Importance of Structure
If we want to estimate any kind of statistical or biophysical model in the high-dimensional setting,
we need to impose additional structure onto the data. For the past twenty years, sparsity has been
a popular and powerful structural assumption. The lasso is a popular incarnation of this, but more
abstractly, the whole multiple testing field has made the same assumption: “only a few genes are truly
differentially expressed”. Imposing such structural assumptions manifests itself in making intractable
problems tractable and providing interpretable statistical results.
Nevertheless, blindly using a sparsity assumption can lead us astray, especially in heterogeneous settings. We need to apply our accumulated biological knowledge to infer structural patterns. For instance,
known signalling or metabolic pathways can impose natural structures on genetic or metabolomic
datasets.
We will continue to develop regularization strategies based on prior biological knowledge. I am
particularly excited about developing methods that can learn or update structural assumptions in a datadriven way (our recent work [68] is a step in this direction). With the plethora of datasets available, we
can use these in an Empirical Bayes way. Such an approach would enable iterative rounds of algorithm
and model improvement, and data mining for new discovery.
C.3.3
Translational Statistics: Bioconductor
This is one of the most difficult open problems in statistical research: how to rapidly produce robust
software that solves a burning scientific question and share it with biomedical scientists.
This question has been driving my research since I started in bioinformatics over 15 years ago, and
my approach is embedded in the international Bioconductor collaboration [1]. Much of the infrastructure of the Bioconductor project (archive, build system, website) is managed by Martin Morgan at the
Roswell Park Cancer Institute in Buffalo, NY. Its scientific content, however, is driven by groups in
multiple locations. I have a track record in algorithm development and flagship biological applications
and I aim to maintain this role.
• I will maintain the software engineering work in my group. Our aim is to increase the usability
of scientific software in terms of documentation, performance, robustness and interoperability.
• I will continue to organise interdisciplinary training courses, such as the Brixen and EMBO
courses (see Section F).
• I will continue to organise the European Bioconductor Developer Workshops and help with similar events in other parts of the world.
Scientific computing evolves rapidly. Although my work is strongly associated with R, this is no
dogma. Our challenge will be to provide effective software platforms for computational biology in
the medium term future, while safeguarding the investments that have been made (e.g. into R, CRAN
and Bioconductor). Notably, over the last few years R has turned from an academic curiosity into a
commercial-grade infrastructure17 . This is excellent news for bioinformatics since the field will benefit
from enormous commercial investments that would be unimaginable with research funding. Nevertheless, I will also keenly monitor developments on other fronts, such as Julia and JavaScript18 .
Particular fields of focus of future work will be data wrangling, cloud computing and visualisation.
Data wrangling is the process of converting data from one (raw) form into another form that allows
for consumption of the data by downstream tools for analysis and integration. Not seldom it takes the
majority of time of an applied analysis project19 . There has recently been remarkable progress in this
area, epitomized by the Hadleyverse20 . Our particular challenge will be to merge the useful concept of
tidy data21 with concepts that have made Bioconductor successful, including self-contained and selfdocumenting data sets, encapsulation, abstraction and provision of sufficient metadata.
17
As evident e.g. from the formation of the R consortium, the acquisition of Revolution by Microsoft, the professional refinement of R by RStudio, or the fact that leading high-tech companies including Facebook, Google, SAP hire R programmers.
18
There is a friendly relationship between R and JavaScript as both derive from LISP / Scheme.
19
http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html
20
http://www.r-bloggers.com/welcome-to-the-hadleyverse
21
Tidy Data. Hadley Wickham, Journal of Statistical Software 59:10 (2014)
Cloudification of resources is a general trend in the computing world that offers cost savings and
increased efficiency. Naturally, it is also affecting bioinformatics. I see our role here not to invent, but
to lead the field by showing how to adapt and specialise generic solutions from the software industry. A
recent example is our provision of Docker containers for an RNA-seq workflow22 .
Scientific visualisation has so far been remarkably conservative, presumably due to the overall conservativeness of the scientific publication process, which is still centred around “papers” (equivalently:
self-contained, printable PDF files). Nevertheless, future generations may learn to make better use of
interactive, computer-aided visualisations and modern web technologies, and I plan to leverage such
new developments from the wider computing world for scientific data visualisation and exploration.
22
https://hub.docker.com/r/vladkim/rnaseq
D
Publications 2012-16
P indicates equal contributions, B co-corresponding authorships. See also http://www.huber.embl.de/
publications. Bibliometry is available, e. g., from Google Scholar.
Corresponding author papers 2012–16
[1] Orchestrating high-throughput genomic analysis with Bioconductor. Wolfgang HuberB ,
Vincent J. Carey, Robert Gentleman, Simon Anders, Marc Carlson, Benilton S. Carvalho, Hector Corrada Bravo, Sean Davis, Laurent Gatto, Thomas Girke, Raphael Gottardo, Florian Hahne,
Kasper D. Hansen, Rafael A. Irizarry, Michael Lawrence, Michael I. Love, James MacDonald,
Valerie Obenchain, Andrzej K. Oleś, Hervé Pagès, Alejandro Reyes, Paul Shannon, Gordon K.
Smyth, Dan Tenenbaum, Levi Waldron, and Martin Morgan. Nature Methods, 12:115–121, 2015.
pdf, url (35 citations23 ).
P
P
P
P
[2] A map of directional genetic interactions in a metazoan cell. Bernd Fischer , Thomas Sandmann , Thomas Horn , Maximilian Billmann , Varun Chaudhary, Wolfgang HuberB , and
Michael BoutrosB . eLife, 4, 2015. pdf, url.
P
P
[3] A chemical-genetic interaction map of small molecules using high-throughput imaging in
cancer cells. Marco Breinig , Felix A. Klein , Wolfgang HuberB , and Michael BoutrosB .
Molecular Systems Biology, 11(12), 2015. pdf, url.
P
P
P
[4] Single-cell transcriptome analysis reveals coordinated ectopic gene-expression patterns in
medullary thymic epithelial cells. Philip Brennecke , Alejandro Reyes , Sheena Pinto ,
Kristin Rattay , Michelle Nguyen, Rita Küchler, Wolfgang HuberB , Bruno KyewskiB , and
Lars M. SteinmetzB . Nature Immunology, 16:933–941, 2015. pdf, url.
P
[5] TimerQuant: A modelling approach to tandem fluorescent timer design and data interpretation for measuring protein turnover in embryos. Joseph D. Barry, Erika Donà, Darren
Gilmour, and Wolfgang Huber. Development, 143(1):174–179, 2016. pdf, url.
[6] SomaticSignatures: inferring mutational signatures from single-nucleotide variants. Julian S. Gehring, Bernd Fischer, Michael Lawrence, and Wolfgang Huber. Bioinformatics,
31(22):3673–3675, 2015. pdf, url.
[7] FourCSeq: Analysis of 4C sequencing data. Felix A. Klein, Tibor Pakozdi, Simon Anders,
Yad Ghavi-Helm, Eileen E. M. Furlong, and Wolfgang Huber. Bioinformatics, 31(19):3085–
3091, 2015. pdf, url.
[8] RNA-Seq workflow: gene-level exploratory analysis and differential expression. Michael I.
Love, Simon Anders, Vladislav Kim, and Wolfgang Huber. F1000Research, 4(1070), 2015.
pdf, url.
P
P
[9] Thermal proteome profiling for unbiased identification of direct and indirect drug targets
using multiplexed quantitative mass spectrometry. Holger Franken , Toby Mathieson ,
Dorothee Childs , Gavain M.A. Sweetman , Thilo Werner, Ina Tögel, Carola Doce, Stephan
Gade, Marcus Bantscheff, Gerard Drewes, Friedrich B.M ReinhardB , Wolfgang HuberB , and
Mikhail M. SavitskiB . Nature Protocols, 10(10):1567–1593, 2015. pdf, url.
P
P
[10] Moderated estimation of fold change and dispersion for RNA-Seq data with DESeq2.
Michael I. Love, Wolfgang Huber, and Simon Anders. Genome Biology, 15(12):550, 2014.
pdf, url (112 citations).
23
Source: ISI Web of Science.
[11] HTSeq – a Python framework to work with high-throughput sequencing data. Simon Anders, Paul Theodor Pyl, and Wolfgang Huber. Bioinformatics, 31(2):166–169, 2015. pdf, url
(315 citations).
[12] h5vc: scalable nucleotide tallies with HDF5. Paul Theodor Pyl, Julian Gehring, Bernd Fischer, and Wolfgang Huber. Bioinformatics, 30(10):1464–1466, 2014. pdf, url.
P
[13] Transcriptome-wide profiling and posttranscriptional analysis of hematopoietic
stem/progenitor cell differentiation toward myeloid commitment. Daniel Klimmeck ,
Nina Cabezas-Wallscheid , Alejandro Reyes , Lisa von Paleske, Simon Renders, Jenny
Hansson, Jeroen Krijgsveld, Wolfgang HuberB , and Andreas TrumppB . Stem Cell Reports,
3(5):858–875, 2014. pdf, url.
P
P
P
P
[14] Mapping genetic interactions in human cancer cells with RNAi and multiparametric phenotyping. Christina Laufer , Bernd Fischer , Maximilian Billmann, Wolfgang HuberB , and
Michael BoutrosB . Nature Methods, 10:427–431, 2013. pdf, url (37 citations).
P
P
[15] Drift and conservation of differential exon usage across tissues in primate species. Alejandro
Reyes , Simon Anders , Robert J. Weatheritt, Toby J. Gibson, Lars M. Steinmetz, and Wolfgang Huber. Proc. Natl. Acad. Sci. U.S.A., 110(38):15377–15382, 2013. pdf, url (11 citations).
[16] Dynamical modelling of phenotypes in a genome-wide RNAi live-cell imaging assay. Gregoire Pau, Thomas Walter, Beate Neumann, Jean-Karim Heriché, Jan Ellenberg, and Wolfgang
Huber. BMC Bioinformatics, 14(1):308, 2013. pdf, url.
P
P
[17] The Genomic and Transcriptomic Landscape of a HeLa Cell Line. Jonathan Landry ,
Paul Theodor Pyl , Tobias Rausch, Thomas Zichner, Manu M. Tekkedil, Adrian M. Stütz, Anna
Jauch, Raeka S. Aiyar, Gregoire Pau, Nicolas Delhomme, Julien Gagneur, Jan O. Korbel, Wolfgang HuberB , and Lars M. SteinmetzB . G3 (Bethesda), 3(8), 2013. pdf, url (85 citations).
P
P
[18] Detecting differential usage of exons from RNA-Seq data. Simon Anders , Alejandro
Reyes , and Wolfgang Huber. Genome Research, 22:2008–2017, 2012. pdf, url (170 citations).
Collaborative papers 2012–16
P
P
[19] A genetic interaction map of cell cycle regulators. Maximilian Billmann , Thomas Horn ,
Bernd Fischer, Thomas Sandmann, Wolfgang Huber, and Michael Boutros. Molecular Biology
of the Cell, 2016. pdf, url.
[20] Myc depletion induces a pluripotent dormant state mimicking diapause. Roberta Scognamiglio, Nina Cabezas-Wallscheid, Marc Christian Thier, Sandro Altamura, Alejandro Reyes,
Áine M. Prendergast, Daniel Baumgärtner, Larissa S. Carnevalli, Ann Atzberger, Simon Haas,
Lisa von Paleske, Thorsten Boroviak, Philipp Wörsdörfer, Marieke A.G. Essers, Ulrich Kloz,
Robert N. Eisenman, Frank Edenhofer, Paul Bertone, Wolfgang Huber, Franciscus van der Hoeven, Austin Smith, and Andreas Trumpp. Cell, 164(4):668–680, 2016. pdf, url.
[21] Landscape and dynamics of transcription initiation in the malaria parasite Plasmodium falciparum. Sophie H. Adjalley, Christophe D. Chabbert, Bernd Klaus, Vicent Pelechano, and
Lars M. Steinmetz. Cell Reports, 14(10):2463–2475, 2016. pdf, url.
[22] Nuclear architecture organized by Rif1 underpins the replication-timing program. Rossana
Foti, Stefano Gnan, Daniela Cornacchia, Vishnu Dileep, Aydan Bulut-Karslioglu, Sarah Diehl,
Andreas Buness, Felix A. Klein, Wolfgang Huber, Ewan Johnstone, Remco Loos, Paul Bertone,
David M. Gilbert, Thomas Manke, Thomas Jenuwein, and Sara C.B. Buonomo. Molecular Cell,
61(2):260–273, 2016. pdf, url.
[23] CYP3A5 mediates basal and acquired therapy resistance in different subtypes of pancreatic ductal adenocarcinoma. Elisa M Noll, Christian Eisen, Albrecht Stenzinger, Elisa Espinet, Alexander Muckenhuber, Corinna Klein, Vanessa Vogel, Bernd Klaus, Wiebke Nadler,
Christoph Rösli, Christian Lutz, Michael Kulke, Jan Engelhardt, Franziska M Zickgraf, Octavio
Espinosa, Matthias Schlesner, Xiaoqi Jiang, Annette Kopp-Schneider, Peter Neuhaus, Marcus
Bahra, Bruno V Sinn, Roland Eils, Nathalia A Giese, Thilo Hackert, Oliver Strobel, Jens Werner,
Markus W Büchler, Wilko Weichert, Andreas Trumpp, and Martin R Sprick. Nature Medicine,
22:278–287, 2016. pdf, url.
[24] An optogenetic method to modulate cell contractility during tissue morphogenesis. Giorgia
Guglielmi, Joseph D. Barry, Wolfgang Huber, and Stefano De Renzis. Developmental Cell,
35(5):646–660, 2015. pdf, url.
[25] Improved binding site assignment by high-resolution mapping of RNA-protein interactions
using iCLIP. Christian Hauer, Tomaz Curk, Simon Anders, Thomas Schwarzl, Anne-Marie Alleaume, Jana Sieber, Ina Hollerer, Madhuri Bhuvanagiri, Wolfgang Huber, Matthias W. Hentze,
and Andreas E. Kulozik. Nature Communications, 6(7921), 2015. pdf, url.
[26] The RNA-binding proteomes from yeast to man harbour conserved enigmRBPs. Benedikt M.
Beckmann, Rastislav Horos, Bernd Fischer, Alfredo Castello, Katrin Eichelbaum, Anne-Marie
Alleaume, Thomas Schwarzl, Tomaz Curk, Sophia Foehr, Wolfgang Huber, Jeroen Krijgsveld,
and Matthias W. Hentze. Nature Communications, 6(10127), 2015. pdf, url.
[27] Thermal proteome profiling monitors ligand interactions with cellular membrane proteins.
Friedrich B.M. Reinhard, Dirk Eberhard, Thilo Werner, Holger Franken, Dorothee Childs, Carola Doce, Maria Fälth Savitski, Wolfgang Huber, Marcus Bantscheff, Mikhail M. Savitski, and
Gerard Drewes. Nature Methods, 2015. pdf, url.
[28] Mutational landscape and complexity in CLL. Thorsten Zenz and Wolfgang Huber. Blood,
126(18):2078–2079, 2015. pdf, url.
[29] Expression atlas update—an integrated database of gene and protein expression in humans,
animals and plants. Robert Petryszak, Maria Keays, Y. Amy Tang, Nuno A. Fonseca, Elisabet Barrera, Tony Burdett, Anja Füllgrabe, Alfonso Muñoz-Pomer Fuentes, Simon Jupp, Satu
Koskinen, Oliver Mannion, Laura Huerta, Karine Megy, Catherine Snow, Eleanor Williams, Mitra
Barzine, Emma Hastings, Hendrik Weisser, James Wright, Pankaj Jaiswal, Wolfgang Huber, Jyoti
Choudhary, Helen E. Parkinson, and Alvis Brazma. Nucleic Acids Research, 44(1):D746–D752,
2016. pdf, url.
[30] Genetic control of chromatin states in humans involves local and distal chromosomal interactions. Grubert Fabian, Judith B. Zaugg, Maya Kasowski, Oana Ursu, Damek V. Spacek,
Alicia R. Martin, Peyton Greenside, Rohith Srivas, Doug H. Phanstiel, Aleksandra Pekowska,
Nastaran Heidari, Ghia Euskirchen, Wolfgang Huber, Jonathan K. Pritchard, Carlos D. Bustamante, Lars M. Steinmetz, Anshul Kundaje, and Michael Snyder. Cell, 162(5):1051–1065, 2015.
pdf, url.
[31] Recurrent CDKN1B (p27) mutations in hairy cell leukemia. Sascha Dietrich, Jennifer
Hüllein, Stanley Chun-Wei Lee, Barbara Hutter, David Gonzalez, Sandrine Jayne, Martin J. S.
Dyer, Małgorzata Oleś, Monica Else, Xiyang Liu, Mikołaj Słabicki, Bian Wu, Xavier Troussard, Jan Dürig, Mindaugas Andrulis, Claire Dearden, Christof von Kalle, Martin Granzow, Anna
Jauch, Stefan Fröhling, Wolfgang Huber, Manja Meggendorfer, Torsten Haferlach, Anthony D.
Ho, Daniela Richter, Benedikt Brors, Hanno Glimm, Estella Matutes, Omar Abdel Wahab, and
Thorsten Zenz. Blood, 126(8):1005–1008, 2015. pdf, url.
[32] Single-cell polyadenylation site mapping reveals 3’ isoform choice variability. Lars Velten,
Simon Anders, Aleksandra Pekowska, Aino I Järvelin, Wolfgang Huber, Vicent Pelechano,
and Lars M. Steinmetz. Molecular Systems Biology, 11(6), 2015. pdf, url.
[33] BRAF inhibitor therapy in HCL. Sascha Dietrich and Thorsten Zenz. Best Practice & Research
Clinical Haematology, 28(4):246–252, 2015. url.
[34] A high-throughput ChIP-Seq for large-scale chromatin studies. Christophe D Chabbert, Sophie H Adjalley, Bernd Klaus, Emilie S Fritsch, Ishaan Gupta, Vicent Pelechano, and Lars M.
Steinmetz. Molecular Systems Biology, 11(1), 2015. pdf, url.
[35] A novel inflammatory pathway mediating rapid hepcidin-independent hypoferremia. Claudia Guida, Sandro Altamura, Felix A. Klein, Bruno Galy, Michael Boutros, Artur J. Ulmer,
Matthias W. Hentze, and Martina U. Muckenthaler. Blood, 125(14):2265–2275, 2015. pdf, url
(13 citations).
[36] Fundamental physical cellular constraints drive self-organization of tissues. Daniel SánchezGutiérrez, Melda Tozluoglu, Joseph D. Barry, Alberto Pascual, Yanlan Mao, and Luis M Escudero. The EMBO Journal, 35(1):77–88, 2015. pdf, url.
[37] An open data ecosystem for cell migration research. Paola Masuzzo, Lennart Martens,
Christophe Ampe, Kurt I. Anderson, Joseph Barry, Olivier De Wever, Olivier Debeir, Christine
Decaestecker, Helmut Dolznig, Peter Friedl, Cedric Gaggioli, Benjamin Geiger, Ilya G. Goldberg,
Elias Horn, Rick Horwitz, Zvi Kam, Sylvia E. Le Dévédec, Danijela Matic Vignjevic, Josh Moore,
Jean-Christophe Olivo-Marin, Erik Sahai, Susanna A. Sansone, Victoria Sanz-Moreno, Staffan
Strömblad, Jason Swedlow, Johannes Textor, Marleen Van Troys, and Roman Zantl. Trends in
Cell Biology, 25(2):55–58, 2015. pdf, url.
[38] Statistical relevance – relevant statistics, part I.
34(22):2727–2730, 2015. pdf, url.
Bernd Klaus.
The EMBO Journal,
[39] A discrete transition zone organizes the topological and regulatory autonomy of the adjacent
Tfap2c and Bmp7 genes. Taro Tsujimura, Felix A. Klein, Katja Langenfeld, Juliane Glaser,
Wolfgang Huber, and François Spitz. PLoS Genetics, 11(1):e1004897, 2015. pdf, url.
[40] Cell-to-cell expression variability followed by signal reinforcement progressively segregates
early mouse lineages. Yusuke Ohnishi, Wolfgang Huber, Akiko Tsumura, Minjung Kang, Panagiotis Xenopoulos, Kazuki Kurimoto, Andrzej K. Oleś, Marcos J. Araúzo-Bravo, Mitinori Saitou,
Anna-Katerina Hadjantonakis, and Takashi Hiiragi. Nature Cell Biology, 16(1):27–37, 2014. pdf,
url (49 citations).
[41] Protein quality control at the inner nuclear membrane. Anton Khmelinskii, Marina Pantazopoulou, Bernd Fischer, Deike J. Omnus, Gaëlle Le Dez, Audrey Brossard, Alexander Gunnarsson, Joseph D. Barry, Matthias Meurer, Daniel Kirrmaier, Charles Boone, Wolfgang Huber,
Gwenaël Rabut, Per O. Ljungdahl, and Michael Knop. Nature, 516(7531):410–413, 2014. pdf,
url.
P
P
[42] Enhancer loops appear stable during development and are associated with paused polymerase. Yad Ghavi-Helm, Felix A. Klein , Tibor Pakozdi , Lucia Ciglar, Daan Noordermeer,
Wolfgang Huber, and Eileen E. M. Furlong. Nature, 512(7512):96–100, 2014. pdf, url (51
citations).
P
[43] Identification of regulatory networks in HSCs and their immediate progeny via integrated proteome, transcriptome, and DNA methylome analysis. Nina Cabezas-Wallscheid,
Daniel Klimmeck, Jenny Hansson, Daniel B Lipka, Alejandro Reyes, Qi Wang, Dieter
Weichenhan, Amelie Lier, Lisa von Paleske, Simon Renders, Peer Wünsche, Petra Zeisberger,
David Brocks, Lei Gu, Carl Herrmann, Simon Haas, Marieke A G Essers, Benedikt Brors, Roland
Eils, Wolfgang Huber, Michael D Milsom, Christoph Plass, Jeroen Krijgsveld, and Andreas
Trumpp. Cell Stem Cell, 15(4):507–522, 2014. pdf, url (24 citations).
P
P
P
P
[44] Measuring genetic interactions in human cells by RNAi and imaging. Christina Laufer, Bernd
Fischer, Wolfgang Huber, and Michael Boutros. Nature Protocols, 9(10):2341–2353, 2014. pdf,
url.
[45] Alternative polyadenylation diversifies post-transcriptional regulation by selective RNA–
protein interactions. Ishaan Gupta, Sandra Clauder-Münster, Bernd Klaus, Aino I Järvelin,
Raeka S. Aiyar, Vladimir Benes, Stefan Wilkening, Wolfgang Huber, Vicent Pelechano, and
Lars M. Steinmetz. Molecular Systems Biology, 10(2), 2014. pdf, url (12 citations).
[46] Expression Atlas update–a database of gene and transcript expression from microarray- and
sequencing-based functional genomics experiments. Robert Petryszak, Tony Burdett, Benedetto
Fiorelli, Nuno A. Fonseca, Mar Gonzalez-Porta, Emma Hastings, Wolfgang Huber, Simon Jupp,
Maria Keays, Nataliya Kryvych, Julie McMurry, John C. Marioni, James Malone, Karine Megy,
Gabriella Rustici, Amy Y. Tang, Jan Taubert, Eleanor Williams, Oliver Mannion, Helen E. Parkinson, and Alvis Brazma. Nucleic Acids Research, 42(1):D926–932, 2014. pdf, url (63 citations).
[47] A genome-wide map of mitochondrial DNA recombination in yeast. Emilie S. Fritsch,
Christophe D. Chabbert, Bernd Klaus, and Lars M. Steinmetz. Genetics, 198(2):755–771, 2014.
pdf, url.
[48] Directional tissue migration through a self-generated chemokine gradient. Erika Donà,
Joseph D. Barry, Guillaume Valentin, Charlotte Quirin, Anton Khmelinskii, Andreas Kunze, Sevi
Durdu, Lionel R. Newton, Ana Fernandez-Minan, Wolfgang Huber, Michael Knop, and Darren
Gilmour. Nature, 503(7475):285–289, 2013. pdf, url (58 citations).
P
P
P
[49] Accounting for technical noise in single-cell RNA-seq experiments. Philip Brennecke , Simon Anders , Jong Kyoung Kim , Aleksandra A. Kolodziejczyk, Xiuwei Zhang, Valentina
Proserpio, Bianka Baying, Vladimir Benes, Sarah A. Teichmann, John C. Marioni, and Marcus G.
Heisler. Nature Methods, 10(11):1093–1095, 2013. pdf, url (77 citations).
[50] Count-based differential expression analysis of RNA sequencing data using R and Bioconductor. Simon Anders, Davis J McCarthy, Yunshun Chen, Michal Okoniewski, Gordon K Smyth,
Wolfgang Huber, and Mark D Robinson. Nature Protocols, 8(9):1765–1786, 2013. pdf, url (136
citations).
[51] An Evaluation of High-Throughput Approaches to QTL Mapping in Saccharomyces cerevisiae. Stefan Wilkening, Gen Lin, Emilie S. Fritsch, Manu M. Tekkedil, Simon Anders, Raquel
Kuehn, Michelle Nguyen, Raeka S. Aiyar, Michael Proctor, Nikita A. Sakhanenko, David J. Galas,
Julien Gagneur, Adam Deutschbauer, and Lars M. Steinmetz. Genetics, 196(3):853–865, 2014.
pdf, url (11 citations).
P
P
[52] High-content siRNA screen reveals global ENaC regulators and potential cystic fibrosis therapy targets. Joana Almaça , Diana Faria , Marisa Sousa, Inna Uliyakina, Christian Conrad,
Lalida Sirianant, Luka A. Clarke, José Paulo Martins, Miguel Santos, Jean-Karim Heriché, Wolfgang Huber, Rainer Schreiber, Rainer Pepperkok, Karl Kunzelmann, and Margarida D. Amaral.
Cell, 154(6):1390–1400, 2013. pdf, url (14 citations).
[53] Software for computing and annotating genomic ranges. Michael Lawrence, Wolfgang Huber, Hervé Pagès, Patrick Aboyoun, Marc Carlson, Robert Gentleman, Martin T. Morgan, and
Vincent J. Carey. PLoS Computational Biology, 9(8):e1003118, 2013. pdf, url (92 citations).
[54] CellH5: a format for data exchange in high-content screening. Christoph Sommer, Michael
Held, Bernd Fischer, Wolfgang Huber, and Daniel W. Gerlich. Bioinformatics, 29:1580–1582,
2013. pdf, url.
[55] Shrinkage estimation of dispersion in Negative Binomial models for RNA-seq experiments
with small sample size. Danni Yu, Wolfgang Huber, and Olga Vitek. Bioinformatics, 29:1275–
1282, 2013. pdf, url.
P
P
[56] Control of tissue morphology by Fasciclin III-mediated intercellular adhesion. Richard E.
Wells , Joseph D. Barry , Simon Cuhlmann, Paul Evans, Wolfgang Huber, David Strutt, and
Martin P. Zeidler. Development, 140:3858–3868, 2013. pdf, url.
[57] Direct competition between hnRNP C and U2AF65 protects the transcriptome from the exonization of Alu elements. Kathi Zarnack, Julian König, Mojca Tajnik, Inigo Martincorena, Sebastian Eustermann, Isabelle Stévant, Alejandro Reyes, Simon Anders, Nicholas M. Luscombe,
and Jernej Ule. Cell, 152(3):453–466, 2013. pdf, url (69 citations).
[58] An efficient method for genome-wide polyadenylation site mapping and RNA quantification. Stefan Wilkening, Vicent Pelechano, Aino I. Järvelin, Manu M. Tekkedil, Simon Anders,
Vladimir Benes, and Lars M. Steinmetz. Nucleic Acids Research, 41(5):e65, 2013. pdf, url (27
citations).
[59] Properties of isotope patterns and their utility for peptide identification in large-scale proteomic experiments. Satoshi Okawa, Bernd Fischer, and Jeroen Krijgsveld. Rapid Communications in Mass Spectrometry, 27(9):1067–1075, 2013. url.
[60] RNA-binding proteins in Mendelian disease. Alfredo Castello, Bernd Fischer, Matthias W
Hentze, and Thomas Preiss. Trends in Genetics, 29:318–327, 2013. pdf, url (43 citations).
[61] System-wide identification of RNA-binding proteins by interactome capture. Alfredo
Castello, Rastislav Horos, Claudia Strein, Bernd Fischer, Katrin Eichelbaum, Lars M. Steinmetz, Jeroen Krijgsveld, and Matthias W Hentze. Nature Protocols, 8(3):491–500, 2013. pdf, url
(26 citations).
[62] Biggest challenges in bioinformatics. Jonathan C Fuller, Pierre Khoueiry, Holger Dinkel,
Kristoffer Forslund, Alexandros Stamatakis, Joseph Barry, Aidan Budd, Theodoros G Soldatos,
Katja Linssen, and Abdul Mateen Rajput. EMBO reports, 14(4):302–304, 2013. pdf, url.
[63] The RNA-binding protein repertoire of embryonic stem cells. S Chul Kwon, Hyerim Yi, Katrin
Eichelbaum, Sophia Föhr, Bernd Fischer, Kwon Tae You, Alfredo Castello, Jeroen Krijgsveld,
Matthias W Hentze, and V Narry Kim. Nature Structural and Molecular Biology, 2013. pdf, url
(69 citations).
[64] Highly coordinated proteome dynamics during reprogramming of somatic cells to pluripotency. Jenny Hansson, Mahmoud Reza Rafiee, Sonja Reiland, Jose M. Polo, Julian Gehring,
Satoshi Okawa, Wolfgang Huber, Konrad Hochedlinger, and Jeroen Krijgsveld. Cell Reports,
2(6):1579–1592, 2012. pdf, url (67 citations).
[65] A cross-platform toolkit for mass spectrometry and proteomics. Matthew C Chambers, Brendan Maclean, Robert Burke, Dario Amodei, Daniel L Ruderman, Steffen Neumann, Laurent Gatto,
Bernd Fischer, Brian Pratt, Jarrett Egertson, Katherine Hoff, Darren Kessner, Natalie Tasman,
Nicholas Shulman, Barbara Frewen, Tahmina A Baker, Mi-Youn Brusniak, Christopher Paulse,
David Creasy, Lisa Flashner, Kian Kani, Chris Moulding, Sean L Seymour, Lydia M Nuwaysir,
Brent Lefebvre, Frank Kuhlmann, Joe Roark, Paape Rainer, Suckau Detlev, Tina Hemenway, Andreas Huhmer, James Langridge, Brian Connolly, Trey Chadick, Krisztina Holly, Josh Eckels,
Eric W Deutsch, Robert L Moritz, Jonathan E Katz, David B Agus, Michael MacCoss, David L
Tabb, and Parag Mallick. Nature Biotechnology, 30(10):918–920, 2012. pdf, url (175 citations).
P
P
[66] Insights into RNA biology from an atlas of mammalian mRNA-binding proteins. Alfredo
Castello , Bernd Fischer , Katrin Eichelbaum, Rastislav Horos, Benedikt M. Beckmann,
Claudia Strein, Norman E. Davey, David T. Humphreys, Thomas Preiss, Lars M. Steinmetz, Jeroen
Krijgsveld, and Matthias W. Hentze. Cell, 149:1393–1406, 2012. pdf, url (319 citations).
[67] Tandem fluorescent protein timers for in vivo analysis of protein dynamics. Anton Khmelinskii, Philipp J. Keller, Anna Bartosik, Matthias Meurer, Joseph D. Barry, Balca R. Mardin, Andreas Kaufmann, Susanne Trautmann, Malte Wachsmuth, Gislene Pereira, Wolfgang Huber, Elmar Schiebel, and Michael Knop. Nature Biotechnology, 30:708–714, 2012. pdf, url (49 citations).
Preprints
[68] Data-driven hypothesis weighting increases detection power in big data analytics. Nikolaos
Ignatiadis, Bernd Klaus, Judith Zaugg, and Wolfgang Huber. bioRχiv, 2015. pdf, url.
[69] Neural lineage induction reveals multi-scale dynamics of 3D chromatin organization. Aleksandra Pekowska, Bernd Klaus, Felix Alexander Klein, Simon Anders, Małgorzata Oleś,
Lars M. Steinmetz, Paul Bertone, and Wolfgang Huber. bioRχiv, 2014. pdf, url.
[70] Mutated SF3B1 is associated with transcript isoform changes of the genes UQCC and RPL31
both in clls and uveal melanomas. Alejandro Reyes, Carolin Blume, Vicent Pelechano, Petra
Jakob, Lars M. Steinmetz, Thorsten Zenz, and Wolfgang Huber. bioRχiv, 2014. pdf, url.
E
List of External Grants (since 2012)
Duration
2015-18
36 months
Name – Funding Body. Role. Topic
SOUND (Statistical Multi-Omics Understanding) – Collaborative research
project, Horizon 2020 Research and Innovation programme Personalising
Health and Care, European Commission. I am the coordinator and lead one
scientific work-package. Topic: to create the bioinformatic tools for statistically
informed use of personal genomic and other omic data in medicine.
2011-15
60 months
Systems Microscopy – Network of Excellence, FP7-HEALTH-2010, European
Commission. I led three RTD work packages and was part of the Executive
Board. Topic: data-driven modelling of cell biological processes from life cell
imaging data.
2012-15
36 months
Radiant – Collaborative research project, FP7-HEALTH-2010, European
Commission. I was scientific co-coordinator (with Magnus Rattray and Neil
Lawrence) and led two RTD work packages. Topic: statistical methods for
high-throughput sequencing technologies.
2013-15
24 months
BIGDATA (Scalable Statistical Computing for Emerging Omics Data Streams)
– US National Science Foundation (NSF) Mid-scale project: DA: ESCE:
Collaborative Research. I was co-investigator. Topic: scaling statistical methods
and Bioconductor software for large ‘omics data streams.
2015-17
24 months
BioTop (Bioinformatic tool harmonization for personalized cancer care) –
BMBF. I am a co-investigator, responsible for RNA-seq data types. Topic:
standardising methods for analysing high-throughput sequencing data for
translational cancer research.
2016-19
36 months
TRANSCAN GCH-CLL (Translational research on human tumour heterogeneity to overcome recurrence and resistance to therapy) – ERA-NET on
translational Cancer Research (TRANSCAN) project. I am a co-investigator,
responsible for computational aspects. Topic: intra-tumour heterogeneity in
chronic lymphocytic leukaemia.
2014-17
36 months
GSK postdoc fellowship – Cellzome GmbH. Academic partner. Topic: 3-year
postdoc project on developing computational and statistical methods for thermal
proteome profiling.
2015-19
60 months
HD-HuB (Heidelberg Centre for Human Bioinformatics) – BMBF. Coinvestigator, contributing a work-package on R/Bioconductor based workflows.
F
Curriculum Vitae
Wolfgang Huber
European Molecular Biology Laboratory (EMBL)
D 69117 Heidelberg
∗
28.5.1968 in Bad Säckingen
nationality: Germany
www.huber.embl.de
[email protected]
Positions
EMBL
Dec 2011 - present
Heidelberg, Mar 2009 - present
Cambridge (UK), Sep 2004 - Feb 2009
DKFZ
Heidelberg, Mar 2000 - Sep 2004
IBM Research
Almaden, San Jose (California)
Jun 1998 - Dec 1999
University of Freiburg
Oct 1994 - May 1998
Univ. Clinic Freiburg
Sep 1991 - Dec 1997
Research group leader
Senior Scientist
Genome Biology Unit
European Bioinformatics Institute (EBI)
Postdoc cancer transcriptomics
Postdoc cheminformatics
Research and teaching assistant, Faculty of Physics
Research assistant, Department of Neurology
Education
1998
Univ. of Freiburg
Dr. rer. nat. (Theoretical Physics)
Thesis Dynamics of strongly driven open quantum systems
1994
Univ. of Freiburg
Diplom (Physics)
Minor in Mathematics (Probability and Statistics)
1990/91
Univ. of Edinburgh
1990
Univ. of Freiburg
Non-graduating exchange student
Physics
Vordiplom (Physics)
Minors in Mathematics and Chemistry
Academic Services – external
Journal reviewing
Editorial board
Grant review boards
Bioinformatics, Biostatistics, Cell Reports, EMBO Reports,
FEBS Letters, Genome Biology, G3 (Genes k Genomes k Genetics), Genome Research, Methods, Molecular Systems Biology, Nature, Nature Biotechnology, Nature Cell Biology, Nature
Methods, Nucleic Acids Research, PLoS ONE, Science, Science
Translational Medicine; Programme Committees ECCB 2012,
ISMB/ECCB 2013, ECCB 2014
Bioinformatics, Giga Science, F1000Prime
HFSP Fellowships
Research proposal reviewing
Boards
Consulting
Academy of Finland, ERC, French NCI (INCa), HRCMM, National Science Centre (Poland), Swiss National Science Foundation (SNF), Skolkovo Fund, Stichting Kinderen Kankervrij
(Foundation Children Cancerfree), Wellcome Trust, Wiener
Wissenschafts-, Forschungs- und Technologiefonds (WWTF),
others
Scientific Advisory Board (SAB) and Technical Advisory
Board: Bioconductor Project (2003 - )
SAB: Sophia Genetics S.A. (CH) (2011 - 2015)
SAB: UMR3244 in Institut Curie (F) (2015 - )
SAB: Graduate School of Quantitative Biosciences Munich
(2014 - )
SAB (Observer): Expression Atlas at EBI
Executive Board: Systems Microscopy EC FP7 Network of Excellence (2011 - 2015)
Genentech (2010 - 2015)
Evotec (2013 - 2014)
Academic Services – within EMBL
Annually since 2007
2012-2016
Coordinator of the ’Omics module of the EMBL International
PhD Programme course
Thesis Advisory Committee: >40 students
Conference (co-)organisation
16 - 18 February 2012
EMBL Heidelberg
Omics and Personalised Health, conference (140 participants)
with Lars Steinmetz, Lee Hood and Rudi Balling
7 - 8 June 2014
EMBL Heidelberg
Annual meeting of the RADIANT consortium (37 participants)
with Magnus Rattray
12 - 13 January 2015
EMBL Heidelberg
Bioconductor European Developer Conference (44 participants)
with Martin Morgan
31 May - 5 June 2015
Centro Stefano Franscini, Ascona, CH
Workshop on Statistical Learning of Biological Systems from
Perturbations (55 participants)
with Niko Beerenwinkel, Peter Bühlmann
16 - 19 November 2015
EMBL Heidelberg
Stanford - EMBL conference: Omics and Personalised Health
(150 participants)
with Lars Steinmetz, Judith Zaugg, Michael Snyder, Peer Bork,
Jan Ellenberg
24 - 25 November 2015
CR UK Manchester Institute
19 - 21 May 2016
DKFZ Heidelberg
6 - 8 June 2016
EMBL
C1omics - Single Cell ’Omics (57 participants)
with Magnus Rattray, Crispin Miller
Cancer Systems Genetics
with Claudia Scholl, Stefan Fröhling, Michael Boutros
Perspectives in Translational Medicine, EMBL Partnership Conference
with Plamena Markova, Andreas Kulozik, Luis Serrano, Kjetil
Tasken, Matthias Wilmanns
4 September 2016
The Hague, NL
Clinical Bioinformatics as a Service, ECCB Workshop
with Niko Beerenwinkel, Daniel Stekhoven, Simon Tavaré
Teaching
2 - 6 July 2012 CSAMA Summer School: Statistics and Computing in Genome
23 - 28 June 2013 Data Science, Brixen, South Tyrol
22 - 27 June 2014
14 - 19 June 2015
10 - 15 July 2016
17 - 22 October 2012
EMBO Practical Course: Analysis and informatics of transcriptomics data, Shenzhen, China.
24 - 25 January 2013
16 - 16 January 2015
25 - 26 February 2016
EMBL Practical Course: Advanced R programming, EMBL
Heidelberg
9 September 2012
ECCB Tutorial – Reads to Biological Patterns: End-to-End Differential Expression Analysis of RNA Sequencing Data Using
Bioconductor
ECCB Workshop – Analysis of Differential Isoform Usage by
RNA-seq: Statistical Methodologies and Open Software
7 September 2014
3 - 8 March 2013
EMBO Practical Course: High-throughput RNAi, EMBL/DKFZ
Heidelberg
29 October - 3 November 2012
20 - 24 October 2014
19 - 23 October 2015
5 - 9 September 2016
EMBO Practical Course: Analysis and informatics of transcriptomics data, EBI-EMBL, Hinxton, UK
15 - 20 October 2012
20 -26 October 2014
17 - 22 October 2016
EMBO Practical Course: High-Throughput Microscopy for Systems Biology, EMBL Heidelberg
Above are the courses that I organised or co-organised, with
level of responsibility ranked from top to bottom. I have taught
at others, mentioned below.
Selected speaker invitations (2012-16 only)
29 February 2012, Munich, DE
20 March 2012, Mainz, DE
Genomatix GmbH, internal seminar
University, Institute of Molecular Biology, institute seminar
26 June 2012, Augsburg, DE
University, Institute for Mathematics, institute seminar
28 June 2012, Würzburg, DE
University, Institute for Medical Infection Genomics, RNA-seq
Workshop
23 - 25 July 2012, Seattle, USA
31 August 2012, Cambridge, UK
10 - 11 October 2012, Cambridge, UK
12 - 13 October 2012, Potsdam, DE
Bioconductor conference
From Phenotypes to Pathways, conference
Literature-Data Integration, workshop
From genomes to networks - New developments in complex
disease analysis, annual workshop of the Society for GeneDiagnostics)
6 - 7 December 2012, Dresden, DE
Biotec Forum, conference
10 December 2012, Heidelberg, DE
University, Heidelberger Kolloquium Medizinische Biometrie,
Informatik und Epidemiologie
11 December 2012, Heidelberg, DE
NGFN Annual Meeting, conference
13 - 14 December 2012, Zurich, CH
Bioconductor Developer Conference
19 March 2013, Palo Alto, USA
20 March 2013, South San Francisco,
USA
8 April 2013, Barcelona, ES
Stanford Genome Technology Centre, institute seminar
Genentech Inc., internal seminar
Institute of Predictive and Personalized Medicine of Cancer
(IMPPC), institute seminar
24 - 27 April 2013, Freiburg, DE
Preclinical models of cancer: Towards enhanced clinical relevance and predictivity, conference
13 May 2013, Lisbon, PT
University, Instituto de Medicina Molecular (IMM), institute
seminar
Dagstuhl Seminar 13212: Computational Methods Aiding
Early-Stage Drug Design
19 - 24 May 2013, Dagstuhl, DE
17 - 19 July 2013, Seattle, USA
23 July 2013, Berlin, DE
11-16 August 2013, Banff, CA
8 - 11 September 2013, Bertinoro, I
23 September 2013, Tübingen, DE
Bioconductor Conference
ISMB Workshop: Professional Networks in Bioinformatics
BIRS workshop: Statistical Data Integration Challenges in
Computational Biology: Regulatory Networks and Personalized
Medicine
Computational Biology meeting: Computational Cancer Genomics, conference
25 September 2013, Stockholm, SE
Summer School on Machine Learning for Personalized
Medicine
Karolinska Institutet, institute seminar
13 - 19 October 2013, Bedlewo, PL
Autumn school on Computational Aspects of Gene Regulation
28 October - 3 November 2013,
Recife, Brazil
9 - 10 December 2013, Cambridge,
UK
12 - 13 December 2013, Cambridge,
UK
9 - 10 January 2014, Paris, F
16 January 2014, Münster, DE
12 May 2014, Heidelberg, DE
2 June 2014, Stockholm, SE
RNA-seq course at Brazilian Symposium on Bioinformatics
Bioconductor Developer Conference
Quantitative Methods in Gene Regulation, conference
Institut Curie, institute seminar
Max-Planck-Institute for Molecular Biomedicine, institute seminar
Cellzome, internal seminar
Systems Microscopy, conference
12 May 2014, Heidelberg, DE
Cellzome, internal seminar
2 July 2014, Saarbrücken, DE
Max-Planck-Institute for Informatics, institute seminar
20 October 2014, Munich, DE
LMU, Gene Centre, institute seminar
29 - 31 October 2014, Stockholm, SE
EMBO Workshop on a Systems-Level View of Cytoskeletal
Function, conference
27 - 28 November 2014, Helsiniki, FI
Institute for Molecular Medicine of Finland (FIMM), institute
seminar
RADIANT workshop
29 - 30 January 2015, Zurich, CH
12 - 13 February 2015, Munich, DE
17 - 19 February 2015, NYU Abu
Dhabi, UAE
25 March 2015, Heidelberg, DE
16 - 17 April 2015, Kloster
Johannisberg, DE
9 June 2015, London, UK
20 - 22 July 2015, Seattle, USA
Statistical Methods for Post Genomic Data, conference
Genomics and Systems Biology, conference
R User Meeting Rhein-Neckar, workshop
Cancer Genomics Meets Cancer Proteomics, workshop
Big Data Analytics, conference
Bioconductor conference
31 July - 2 August 2015, Pozega,
Croatia
15 - 18 September 2015, Saas-Fee, CH
Summer School of Science
26 - 27 October 2015, Arlington, USA
NSF workshop on Mathematical Biology
7 - 8 December 2015, Cambridge, UK
Bioconductor Developer Conference
13 January 2016, Heidelberg, DE
22 - 29 January 2016, Bellairs,
Barbados
15 February 2016, London, UK
23 February 2016, Basel, CH
8 March 2016, Palo Alto, USA
CERN ROOT 20th anniversary workshop
University Hospital, Medical Clinic V, institute seminar
Genetic Networks, workshop
Imperial College BRC Genomics Seminar Series
Novartis, internal seminar
Stanford University, Department of Statistics, institute seminar
9 March 2016, Claremont, USA
Harvey Mudd College, Biology Colloqium
10 March 2016, Berkeley, USA
UC Berkeley, Department of Statistics, Statistics & Genomics
Seminar
UC Santa Cruz, institute seminar
16 March 2016, Santa Cruz, USA
18 March 2016, Mountain View, USA
23 March 2016, Palo Alto, USA
14 April 2016, Utrecht, NL
22 April 2016, Mainz, DE
25 - 27 April 2016, Copenhagen, DK
2 May 2016, Paris, F
23andMe, internal seminar
Stanford Genome Technology Center, institute seminar
Centre for Molecular Medicine, institute seminar
Institute for Medical Biometry, Epidemiology and Informatics
(IMBEI), symposium
MedBioinformatics, conference
High Energy Physics Software Foundation, workshop
30 May - 3 June 2016, Paris, F
École Analyse Génome Tumoral, summer school
Software
See also http://www.huber.embl.de/software
Primary author and maintainer
vsn: microarray normalisation [154]
cellHTS , cellHTS2 : RNAi screen normalisation and quality
control [123]
tilingArray: transcript discovery and mapping [124]
arrayQualityMetrics: interactive microarray data quality reports [105]
Initiation, co-authorship, supervision
DESeq, DESeq2 : RNA-seq differential expression [10] [82]
htseq: processing reads from high-throughput sequencing [11]
IHW : Independent hypothesis weighting [68]
lpsymphony: mixed integer-linear program solver
biomaRt: programmatic access to BioMarts [131]
EBImage: image processing in R [84]
DEXSeq: detecting differential usage of exons from RNA-seq
data [18]
h5vc: scalable nucleotide tallies with HDF5 [12]
rhdf5 : HDF5 interface to R
FourCSeq: analysis of 4C sequencing data [7]
SomaticSignatures: inferring mutational signatures from singlenucleotide variants [6]
BiocStyle: document formatting for executable documents
Publications from before 2012
P
P
P
[71] Mapping of signalling networks through synthetic genetic interaction analysis by RNAi.
Thomas Horn , Thomas Sandmann , Bernd Fischer , Elin Axelsson, Wolfgang Huber, and
Michael Boutros. Nature Methods, 8(4), 2011. pdf, url (74 citations).
P
P
P
[72] Antisense expression increases gene expression variability and locus interdependency.
Zhenyu Xu , Wu Wei , Julien Gagneur , Sandra Clauder-Münster, Miłosz Smolik, Wolfgang
Huber, and Lars M. Steinmetz. Molecular Systems Biology, 7, 2011. pdf, url (65 citations).
[73] cAMP response element-binding protein is a primary hub of activity-driven neuronal gene
expression. E. Benito, L. M. Valor, M. Jimenez-Minchan, W. Huber, and A. Barco. Journal of
Neuroscience, 31:18237–18250, 2011. pdf, url (31 citations).
[74] Genome-wide survey of post-meiotic segregation during yeast recombination. Eugenio
Mancera, Richard Bourgon, Wolfgang Huber, and Lars M. Steinmetz. Genome Biology, 12:R36,
2011. pdf, url (10 citations).
[75] Contributions of the EMERALD project to assessing and improving microarray data quality. Vidar Beisvåg, Audrey Kauffmann, James Malone, Carole Foy, Marc Salit, Heinz Schimmel, Erik Bongcam-Rudloff, Ulf Landegren, Helen Parkinson, Wolfgang Huber, Alvis Brazma,
Arne K. Sandvik, and Martin Kuiper. BioTechniques, 50:27–31, 2011. pdf, url.
[76] Enterotypes of the human gut microbiome. Mani Arumugam, Jeroen Raes, E. Pelletier,
D. Le Paslier, T. Yamada, D. R. Mende, G. R. Fernandes, J. Tap, T. Bruls, J. M. Batto, M. Bertalan,
N. Borruel, F. Casellas, L. Fernandez, L. Gautier, T. Hansen, M. Hattori, T. Hayashi, M. Kleerebezem, K. Kurokawa, M. Leclerc, F. Levenez, C. Manichanh, H. B. Nielsen, T. Nielsen, N. Pons,
J. Poulain, J. Qin, T. Sicheritz-Ponten, S. Tims, D. Torrents, E. Ugarte, E. G. Zoetendal, J. Wang,
F. Guarner, O. Pedersen, W. M. de Vos, S. Brunak, J. Dore, J. Weissenbach, S. D. Ehrlich, Peer
Bork, Metagenomics Consortium:, M. Antolin, F. Artiguenave, H. M. Blottiere, M. Almeida,
C. Brechot, C. Cara, C. Chervaux, A. Cultrone, C. Delorme, G. Denariaz, R. Dervyn, K. U. Foerstner, C. Friss, M. van de Guchte, E. Guedon, F. Haimet, Wolfgang Huber, J. van Hylckama-Vlieg,
A. Jamet, C. Juste, G. Kaci, J. Knol, O. Lakhdari, S. Layec, K. Le Roux, E. Maguin, A. Merieux,
R. Melo Minardi, C. M’rini, J. Muller, R. Oozeer, J. Parkhill, P. Renault, M. Rescigno, N. Sanchez,
S. Sunagawa, A. Torrejon, K. Turner, G. Vandemeulebrouck, E. Varela, Y. Winogradsky, and
G. Zeller. Nature, 473:174–180, 2011. pdf, url (141 citations).
[77] Assessing Affymetrix GeneChip microarray quality. Matthew M. McCall, Peter N. Murakami,
Margus Lukk, Wolfgang Huber, and Rafael A. Irizarry. BMC Bioinformatics, 12:137, 2011. pdf,
url (23 citations).
[78] Polymorphisms in CTNNBL1 in relation to colorectal cancer with evolutionary implications.
S. Huhn, D. Ingelfinger, J. L. Bermejo, M. Bevier, B. Pardini, A. Naccarati, V. Steinke, N. Rahner,
E. Holinski-Feder, M. Morak, H. K. Schackert, H. Gorgens, C. P. Pox, T. Goecke, M. Kloor, M. Loeffler, R. Buttner, L. Vodickova, J. Novotny, K. Demir, C. M. Cruciat, R. Renneberg, W. Huber,
C. Niehrs, M. Boutros, P. Propping, P. Vodieka, K. Hemminki, and A. Forsti. Int J Mol Epidemiol
Genet, 2:36–50, 2011. pdf, url.
[79] Extracting quantitative genetic interaction phenotypes from matrix combinatorial RNAi.
Elin Axelsson, Thomas Sandmann, Thomas Horn, Michael Boutros, Wolfgang Huber, and
Bernd Fischer. BMC Bioinformatics, 12:342, 2011. pdf, url.
[80] Relating CNVs to transcriptome data at fine-resolution: assessment of the effect of variant
size, type, and overlap with functional regions. Andreas Schlattl, Simon Anders, Sebastian M.
Waszak, Wolfgang Huber, and Jan O. Korbel. Genome Research, 21:2004–2013, 2011. pdf, url
(42 citations).
[81] Independent filtering increases detection power for high-throughput experiments. Richard
Bourgon, Robert Gentleman, and Wolfgang Huber. PNAS, 107(21):9546–9551, 2010. pdf, url
(152 citations).
[82] Differential expression analysis for sequence count data. Simon Anders and Wolfgang Huber.
Genome Biology, 11:R106, 2010. pdf, url (2460 citations).
P
P
[83] Clustering phenotype populations by genome-wide RNAi and multiparametric imaging. Florian Fuchs , Gregoire Pau , Dominique Kranz, Oleg Sklyar, Christoph Budjan, Sandra Steinbrink, Thomas Horn, Angelika Pedal, Wolfgang Huber, and Michael Boutros. Molecular Systems
Biology, 6(370), 2010. pdf, url (59 citations).
[84] EBImage – an R package for image processing with applications to cellular phenotypes. Gregoire Pau, Florian Fuchs, Oleg Sklyar, Michael Boutros, and Wolfgang Huber. Bioinformatics,
26:979–981, 2010. pdf, url (60 citations).
[85] Genome-wide analysis of mRNA decay patterns during early Drosophila development. Stefan Thomsen, Simon Anders, Sarath Chandra Janga, Wolfgang Huber, and Claudio R. Alonso.
Genome Biology, 11:R93, 2010. pdf, url (36 citations).
[86] Phenotypic profiling of the human genome by time-lapse microscopy reveals cell division
genes. Beate Neumann, Thomas Walter, Jean-Karim Heriché, Jutta Bulkescher, Holger Erfle,
Christian Conrad, Phill Rogers, Ina Poser, Michael Held, Urban Liebel, Cihan Cetin, Frank
Sieckmann, Gregoire Pau, Rolf Kabbe, Annelie Wuensche, Venkata Satagopam, Michael H. A.
Schmitz, Catherine Chapuis, Daniel W. Gerlich, Reinhard Schneider, Roland Eils, Wolfgang Huber, Jan-Michael Peters, Anthony A. Hyman, Richard Durbin, Rainer Pepperkok, and Jan Ellenberg. Nature, 464(7289):721–727, 2010. pdf, url (348 citations).
[87] CellCognition: time-resolved phenotype annotation in high-throughput live cell imaging.
Michael Held, M. H. Schmitz, Bernd Fischer, Thomas Walter, Beate Neumann, M. H. Olma,
M. Peter, Jan Ellenberg, and Daniel W. Gerlich. Nature Methods, 7(9):747–754, 2010. pdf, url
(93 citations).
[88] Addressing accuracy and precision issues in iTRAQ quantitation. Natasha A. Karp, Wolfgang
Huber, Pawel G. Sadowski, Philip D. Charles, Svenja V. Hester, and Kathryn S. Lilley. Molecular
and Cellular Proteomics, 9:1885–97, 2010. pdf, url (176 citations).
[89] Organelle proteomics experimental designs and analysis. Laurent Gatto, Juan Antonio
Vizcaı́no, Henning Hermjakob, Wolfgang Huber, and Kathryn S. Lilley. Proteomics, 2010. pdf,
url (23 citations).
[90] High-resolution transcription atlas of the mitotic cell cycle in budding yeast. Marina V. Granovskaia, Lars J. Jensen, Matthew E. Ritchie, Jörn Tödling, Ye Ning, Peer Bork, Wolfgang
Huber, and Lars M. Steinmetz. Genome Biology, 11:R24, 2010. pdf, url (40 citations).
[91] Variation in transcription factor binding among humans. Maya Kasowski, Fabian Grubert,
Christopher Heffelfinger, Manoj Hariharan, Akwasi Asabere, Sebastian M. Waszak, Lukas Habegger, Joel Rozowsky, Minyi Shi, Alexander E. Urban, Mi-Young Hong, Konrad J. Karczewski,
Wolfgang Huber, Sherman M. Weissman, Mark B. Gerstein, Jan O. Korbel, and Michael Snyder.
Science, 328:232–235, 2010. pdf, url (266 citations).
[92] Microarray data quality control improves the detection of differentially expressed genes.
Audrey Kauffmann and Wolfgang Huber. Genomics, 95:138–142, 2010. pdf, url (28 citations).
[93] A large-scale RNAi screen identifies Deaf1 as a regulator of innate immune responses in
Drosophila. David Kuttenkeuler, Nadege Pelte, Anan Ragab, Viola Gesellchen, Lena Schneider, Claudia Blass, Elin Axelsson, Wolfgang Huber, and Michael Boutros. Journal of Innate
Immunity, 2:181–194, 2010. pdf, url (20 citations).
[94] Comparison of normalization methods for Illumina BeadChip(R) HumanHT-12 v3. Ramona
Schmid, Patrick Baum, Carina Ittrich, Katrin Fundel-Clemens, Wolfgang Huber, Benedikt Brors,
Roland Eils, Andreas Weith, Detlev Mennerich, and Karsten Quast. BMC Genomics, 11:349,
2010. pdf, url (31 citations).
[95] A global map of human gene expression. Margus Lukk, Misha Kapushesky, Janne Nikkila,
Helen Parkinson, Angela Goncalves, Wolfgang Huber, Esko Ukkonen, and Alvis Brazma. Nature
Biotechnology, 28:322–324, 2010. pdf, url (156 citations).
P
P
[96] Bidirectional promoters generate pervasive transcription in yeast. Zhenyu Xu , Wu Wei ,
Julien Gagneur, Fabiana Perocchi, Sandra Clauder-Muenster, Jurgi Camblong, Elisa Guffanti,
Francoise Stutz, Wolfgang Huber, and Lars M. Steinmetz. Nature, 457(7232):1033–1037, 2009.
pdf, url (376 citations).
P
P
[97] High-resolution mapping of meiotic crossovers and non-crossovers in yeast. Eugenio
Mancera , Richard Bourgon , Alessandro Brozzi, Wolfgang Huber, and Lars M. Steinmetz.
Nature, 454(7203):479–485, 2008. pdf, url (253 citations).
[98] The hwriter package. Gregoire Pau and Wolfgang Huber. The R Journal, 1(1):22–24, 2009.
pdf, url.
[99] Array-based genotyping in S. cerevisiae using semi-supervised clustering. Richard Bourgon,
Eugenio Mancera, Alessandro Brozzi, Lars M. Steinmetz, and Wolfgang Huber. Bioinformatics,
25(8):1056–1062, 2009. pdf, url.
[100] Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Steffen Durinck, Paul T. Spellman, Ewan Birney, and Wolfgang Huber. Nature
Protocols, 4(8):1184–1191, 2009. pdf, url (108 citations).
[101] Visualisation of genomic data with the Hilbert curve. Simon Anders. Bioinformatics,
25:1231–1235, 2009. pdf, url (27 citations).
[102] ShortRead: a Bioconductor package for input, quality assessment and exploration of highthroughput sequence data. Martin Morgan, Simon Anders, Michael Lawrence, Patrick Aboyoun, Hervé Pagés, and Robert Gentleman. Bioinformatics, 25:2607, 2009. pdf, url (115 citations).
[103] Genome-wide allele- and strand-specific expression profiling. Julien Gagneur, Himanshu
Sinha, Fabiana Perocchi, Richard Bourgon, Wolfgang Huber, and Lars M. Steinmetz. Molecular Systems Biology, 5:274, 2009. pdf, url (22 citations).
[104] Quality assessment and data analysis for microRNA expression arrays. Deepayan Sarkar,
R. Parkin, S. Wyman, A. Bendoraite, C. Sather, J. Delrow, A. K. Godwin, C. Drescher, Wolfgang
Huber, Robert Gentleman, and Munesh Tewari. Nucleic Acids Research, 37(2), 2009. pdf, url (29
citations).
[105] arrayQualityMetrics - a Bioconductor package for quality assessment of microarray data.
Audrey Kauffmann, Robert Gentleman, and Wolfgang Huber. Bioinformatics, 25:415–416,
2009. pdf, url (17 citations).
[106] Importing ArrayExpress datasets into R/Bioconductor. Audrey Kauffmann, Tim F. Rayner,
Helen Parkinson, Misha Kapushesky, Margus Lukk, Alvis Brazma, and Wolfgang Huber. Bioinformatics, 25:2092–2094, 2009. pdf, url (17 citations).
[107] Analyzing ChIP-chip data using Bioconductor. Jörn Tödling and Wolfgang Huber. PLoS
Computational Biology, 4(11), 2008. pdf, url (13 citations).
[108] Rintact: enabling computational analysis of molecular interaction data from the IntAct
repository. Tony Chiang, Nianhua Li, Sandra Orchard, Samuel Kerrien, Henning Hermjakob,
Robert Gentleman, and Wolfgang Huber. Bioinformatics, 24(8):1100–1101, 2008. pdf, url.
[109] Model-based variance-stabilizing transformation for Illumina microarray data. Simon M.
Lin, Pan Du, Wolfgang Huber, and Warren A. Kibbe. Nucleic Acids Res, 36(2), 2008. pdf, url
(231 citations).
[110] Combinatorial effects of four histone modifications in transcription and differentiation.
Jenny J. Fischer, Jörn Tödling, Tammo Krüger, Markus Schüler, Wolfgang Huber, and Silke
Sperling. Genomics, 91(1):41–51, 2008. pdf, url (23 citations).
[111] Estimating node degree in bait-prey graphs. Denise Scholtens, Tony Chiang, Wolfgang Huber, and Robert Gentleman. Bioinformatics, 24(2):218–224, 2008. pdf, url (10 citations).
[112] Florian Hahne, Wolfgang Huber, Robert Gentleman, and Seth Falcon. Bioconductor Case
Studies. Use R. Springer, 2008. pdf, url (92 citations).
[113] Coverage and error models of protein-protein interaction data by directed graph analysis.
Tony Chiang, Denise Scholtens, Deepayan Sarkar, Robert Gentleman, and Wolfgang Huber.
Genome Biology, 8(9), 2007. pdf, url (23 citations).
[114] Making the most of high-throughput protein-interaction data. Robert Gentleman and Wolfgang Huber. Genome Biology, 8(10):112–112, 2007. pdf, url (24 citations).
[115] Graphs in molecular biology. Wolfgang Huber, Vincent J. Carey, Li Long, Seth Falcon, and
Robert Gentleman. BMC Bioinformatics, 8(Suppl. 6), 2007. pdf, url (42 citations).
[116] Ringo–an R/Bioconductor package for analyzing ChIP-chip readouts. Jörn Tödling, Oleg
Sklyar, Tammo Krüger, Jenny J. Fischer, Silke Sperling, and Wolfgang Huber. BMC Bioinformatics, 8:221–221, 2007. pdf, url.
[117] In situ analysis of cross-hybridisation on microarrays and the inference of expression correlation. Tineke Casneuf, Yves Van de Peer, and Wolfgang Huber. BMC Bioinformatics, 8:461–
461, 2007. pdf, url (40 citations).
[118] CoCo: a web application to display, store and curate ChIP-on-chip data integrated with
diverse types of gene expression data. Charles Girardot, Oleg Sklyar, Sophie Grosz, Wolfgang
Huber, and Eileen E. M. Furlong. Bioinformatics, 23(6):771–773, 2007. pdf, url.
[119] Genomic organization of transcriptomes in mammals: Coregulation and cofunctionality. Antje Purmann, Jörn Tödling, Markus Schüler, Piero Carninci, Hans Lehrach, Yoshihide
Hayashizaki, Wolfgang Huber, and Silke Sperling. Genomics, 89(5):580–587, 2007. pdf, url (32
citations).
[120] High-throughput flow cytometry-based assay to identify apoptosis-inducing proteins. Mamatha Sauermann, Florian Hahne, Christian Schmidt, Meher Majety, Heiko Rosenfelder,
Stephanie Bechtel, Wolfgang Huber, Annemarie Poustka, Dorit Arlt, and Stefan Wiemann. Journal of Biomolecular Screening, 12(4):510–520, 2007. pdf, url.
[121] Comparative analysis of structured RNAs in S. cerevisiae indicates a multitude of different
functions. Stephan Steigele, Wolfgang Huber, Claudia Stocsits, Peter F. Stadler, and Kay Nieselt.
BMC Biology, 5:25–25, 2007. pdf, url (22 citations).
P
P
[122] A high-resolution map of transcription in the yeast genome. Lior David , Wolfgang Huber , Marina Granovskaia, Jörn Tödling, Curtis J. Palm, Lee Bofkin, T. Jones, Ron W. Davis,
and Lars M. Steinmetz. PNAS, 103(14):5320–5325, 2006. pdf, url (393 citations).
[123] Analysis of cell-based RNAi screens. Michael Boutros, Lı́gia P. Brás, and Wolfgang Huber.
Genome Biology, 7(7), 2006. pdf, url (149 citations).
[124] Transcript mapping with high-density oligonucleotide tiling arrays. Wolfgang Huber, Jörn
Tödling, and Lars M. Steinmetz. Bioinformatics, 22(16):1963–1970, 2006. pdf, url (91 citations).
[125] Statistical methods and software for the analysis of highthroughput reverse genetic assays
using flow cytometry readouts. Florian Hahne, Dorit Arlt, Mamatha Sauermann, Meher Majety,
Annemarie Poustka, Stefan Wiemann, and Wolfgang Huber. Genome Biology, 7(8), 2006. pdf,
url (14 citations).
[126] Reproducible statistical analysis in microarray profiling studies. Ulrich Mansmann, Markus
Ruschhaupt, and Wolfgang Huber. Methods of Information in Medicine, 45:139–145, 2006. pdf,
url.
[127] The LIFEdb database in 2006. Alexander Mehrle, Heiko Rosenfelder, Ingo Schupp, Coral del
Val, Dorit Arlt, Florian Hahne, Stephanie Bechtel, Jeremy Simpson, Oliver Hofmann, Winston
Hide, Karl-Heinz Glatting, Wolfgang Huber, Rainer Pepperkok, Annemarie Poustka, and Stefan
Wiemann. Nucleic Acids Research, 34(Database issue):415–418, 2006. pdf, url (21 citations).
[128] Robert Gentleman, Florian Hahne, and Wolfgang Huber. Visualizing genomic data. Technical
Report 10, Bioconductor Project Working Papers, 2006. pdf, url.
[129] Image analysis for microscopy screens. Oleg Sklyar and Wolfgang Huber. R News, 6(5):12–
16, 2006. pdf, url.
[130] Transcript mapping with high-density tiling arrays. Matthew Ritchie and Wolfgang Huber.
R News, 6(5):23–27, 2006. pdf, url.
[131] BioMart and Bioconductor: a powerful link between biological databases and microarray
data analysis. Steffen Durinck, Yves Moreau, Arek Kasprzyk, Sean Davis, Bart De Moor, Alvis
Brazma, and Wolfgang Huber. Bioinformatics, 21:3439–3440, 2005. pdf, url (286 citations).
[132] Functional profiling: from microarrays via cell-based assays to novel tumor relevant modulators of the cell cycle. Dorit Arlt, Wolfgang Huber, Urban Liebel, C. Schmidt, Meher Majety,
Mamatha Sauermann, Heiko Rosenfelder, Stefanie Bechtel, Alexander Mehrle, Detlev Bannasch,
Ingo Schupp, Markus Seiler, Jeremy C. Simpson, Florian Hahne, Petra Moosmayer, Markus
Ruschhaupt, Birgit Guilleaume, Ruth Wellenreuther, Rainer Pepperkok, Holger Sültmann, Annemarie Poustka, and Stefan Wiemann. Cancer Research, 65(17):7733–7742, 2005. pdf, url (19
citations).
[133] Systematic comparison of surface coatings for protein microarrays. Birgit Guilleaume, Andreas Buness, C. Schmidt, F. Klimek, G. Moldenhauer, Wolfgang Huber, Dorit Arlt, Ulrike Korf,
Stefan Wiemann, and Annemarie Poustka. Proteomics, 5:4705–4712, 2005. pdf, url (32 citations).
[134] Gene expression in kidney cancer is associated with cytogenetic abnormalities, metastasis
formation, and patient survival. Holger Sültmann, Anja von Heydebreck, Wolfgang Huber,
Rupert Kuner, Andreas Buness, Markus Vogt, Bastian Gunawan, Martin Vingron, Laszlo Fuzesi,
and Annemarie Poustka. Clinical Cancer Research, 11:646–655, 2005. pdf, url (52 citations).
[135] arrayMagic: two-colour cDNA microarray quality control and preprocessing. Andreas
Buness, Wolfgang Huber, Klaus Steiner, Holger Sültmann, and Annemarie Poustka. Bioinformatics, 21(4):554–556, 2005. pdf, url (36 citations).
[136] Novel cancer relevant cell cycle modulators identified in automated cell-based assays. Dorit
Arlt, Wolfgang Huber, Mamatha Sauermann, Meher Majety, Florian Hahne, Rainer Pepperkok,
Annemarie Poustka, and Stefan Wiemann. European Journal of Cell Biology, 84(Suppl. 55):30,
2005.
[137] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Bioinformatics - from Genomes
to Therapies, chapter Low-level analysis of microarray experiments. Wiley-VCH, 2005. pdf.
[138] On the synthesis of microarray experiments. Robert Gentleman, Markus Ruschhaupt, and
Wolfgang Huber. Journal de la Société Française de Statistique, 146(1-2), 2005. pdf, url.
[139] Robert Gentleman, Vincent J. Carey, Wolfgang Huber, Rafael Irizarry, and Sandrine Dudoit, editors. Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Springer,
2005. url (1925 citations).
[140] Bioconductor: open software development for computational biology and bioinformatics.
Robert C. Gentleman, Vincent J. Carey, Douglas M. Bates, Ben Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Y.C. Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn,
Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler,
Anthony J. Rossini, Günther Sawitzki, Colin Smith, Gordon Smyth, Luke Tierney, Jean Y.H. Yang,
and J.H. Zhang. Genome Biology, 5(10), 2004. pdf, url (5421 citations).
[141] matchprobes: a Bioconductor package for the sequence-matching of microarray probe elements. Wolfgang Huber and Robert Gentleman. Bioinformatics, 20:1651–1652, 2004. pdf, url
(25 citations).
[142] A compendium to ensure computational reproducibility in high-dimensional classification
tasks. Markus Ruschhaupt, Wolfgang Huber, Annemarie Poustka, and Ulrich Mansmann. Statistical Applications in Genetics and Molecular Biology, 3(37), 2004. pdf, url (90 citations).
[143] Systematic analysis of T7 RNA polymerase based in vitro linear RNA amplification for use
in microarray experiments. Jörg Schneider, Andreas Buness, Wolfgang Huber, Joachim Volz,
Petra Kioschis, Mathias Hafner, Annemarie Poustka, and Holger Sültmann. BMC Genomics,
5(1):29, 2004. pdf, url (60 citations).
[144] From ORFeome to biology: a functional genomics pipeline. Stefan Wiemann, Dorit Arlt,
Wolfgang Huber, Ruth Wellenreuther, Simone Schleeger, Alexander Mehrle, Stephanie Bechtel,
Mamatha Sauermann, Ulrike Korf, Rainer Pepperkok, Holger Sültmann, and Annemarie Poustka.
Genome Research, 108:2136–44, 2004. pdf, url (35 citations).
[145] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics, chapter Error models for microarray intensities. John
Wiley & Sons, 2004. pdf (12 citations).
[146] Anja von Heydebreck, Wolfgang Huber, and Robert Gentleman. Encyclopedia of Genetics,
Genomics, Proteomics and Bioinformatics, chapter Differential Expression with the Bioconductor
Project. John Wiley & Sons, 2004. pdf (51 citations).
[147] Multi-domain protein families and domain pairs: Comparison with known structures and
a random model of domain recombination. Gordana Apic, Wolfgang Huber, and Sarah A.
Teichmann. Journal of Structural and Functional Genomics, 4:67–78, 2003. pdf (76 citations).
[148] Cytogenetic and morphologic typing of 58 papillary renal cell carcinomas: Evidence for a
cytogenetic evolution of type 2 from type 1 tumors. Bastian Gunawan, Anja von Heydebreck,
Thekla Fritsch, Wolfgang Huber, Rolf-Hermann Ringert, Gerhard Jakse, and László Füzesi. Cancer Research, 63:6200–6205, 2003. pdf, url (66 citations).
[149] Mathematical tree models for cytogenetic development in solid tumors. Anja von Heydebreck, Bastian Gunawan, Wolfgang Huber, Martin Vingron, and Laszlo Füzesi. Verhandlungen
der Deutschen Gesellschaft für Pathologie, 2003.
[150] Parameter estimation for the calibration and variance stabilization of microarray data.
Wolfgang Huber, Anja von Heydebreck, Holger Sültmann, Annemarie Poustka, and Martin Vingron. Statistical Applications in Genetics and Molecular Biology, 2(1):Article 3, 2003. pdf, url
(158 citations).
[151] Wolfgang Huber, Anja von Heydebreck, and Martin Vingron. Analysis of microarray gene
expression data. In Martin Bishop et al., editor, Handbook of Statistical Genetics. John Wiley &
Sons, Ltd, Chichester, UK, 2003. pdf (53 citations).
[152] Prognostic factors influencing surgical management and outcome of gastrointestinal stromal tumours. C. Langer, Bastian Gunawan, P. Schüler, Wolfgang Huber, Laszlo Füzesi, and
H. Becker. British Journal of Surgery, 90:332–399, 2003. pdf, url (114 citations).
[153] Transcription profiling of renal cell carcinoma. Wolfgang Huber, Judith M. Boer, Anja von
Heydebreck, Bastian Gunawan, Martin Vingron, László Füzesı́, Annemarie Poustka, and Holger
Sültmann. Verhandlungen der Deutschen Gesellschaft für Pathologie, 86:153–164, 2002.
[154] Variance stabilization applied to microarray data calibration and to the quantification of
differential expression. Wolfgang Huber, Anja von Heydebreck, Holger Sültmann, Annemarie
Poustka, and Martin Vingron. Bioinformatics, 18 Suppl 1:96–104, 2002. pdf, url (1673 citations).
[155] Identification and classification of differentially expressed genes in renal cell carcinoma by
expression profiling on a global human 31,500-element cDNA array. Judith M. Boer, Wolfgang Huber, Holger Sültmann, Friederike Wilmer, Anja von Heydebreck, Stefan Haas, Bernhard
Korn, Bastian Gunawan, Astrid Vente, Laszlo Füzesi, Martin Vingron, and Annemarie Poustka.
Genome Research, 11(11):1861–1870, 2001. pdf, url (145 citations).
[156] Prognostic impacts of cytogenetic findings in clear cell renal cell carcinoma: Chromosome
translocation der(3)t(3;5) or gain of 5q predict a distinct clinical phenotype with favourable
prognosis. Bastian Gunawan, Wolfgang Huber, Meike Holtrup, Anja von Heydebreck, Thomas
Efferth, Annemarie Poustka, Rolf-Hermann Ringert, Gerhard Jakse, and László Füzesi. Cancer
Research, 61:7731–7738, 2001. pdf, url (67 citations).
[157] FLASHFLOOD: A 3D field-based similarity search and alignment method for flexible
molecules. Michael C. Pitman, Wolfgang Huber, Hans Horn, Andreas Krämer, Julia E. Rice,
and William C. Swope. Journal of Computer-Aided Molecular Design, 15:587–612, 2001. pdf,
url (18 citations).
[158] Identifying splits with clear separation: A new class discovery method for gene expression
data. Anja von Heydebreck, Wolfgang Huber, Annemarie Poustka, and Martin Vingron. Bioinformatics, 17 Suppl. 1:S107–114, 2001. pdf, url (77 citations).
[159] Gene expression profiling of kidney cancer using a tumor-specific cDNA microarray. Holger
Sültmann, Wolfgang Huber, Laszlo Fuzesi, Bastian Gunawan, Anja von Heydebreck, Martin
Vingron, and Annemarie Poustka. Clinical Cancer Research, 7(11, Suppl. S):155, 2001. pdf, url.
[160] Quasistationary distributions of dissipative nonlinear quantum oscillators in strong periodic driving fields. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Physical
Review E, 61:4883–4889, 2000. pdf, url (26 citations).
[161] Stochastic wave function method versus density matrix: a numerical comparison. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Computer Physics Communications,
104:46–58, 1997. pdf, url (16 citations).
[162] Vestibular-neck interaction and transformation of sensory coordinates. Thomas Mergner,
Wolfgang Huber, and Wolfgang Becker. Journal of Vestibular Research, 7:347–367, 1997. (100
citations).
[163] Spatially resolved measurement and modeling of blood brain barrier permeability. Wolfgang Huber, Klaus Kopitzki, Jens Timmer, and Peter Warnke. Biomedizinische Technik, 41 suppl.
1:160, 1996. pdf.
[164] Fast Monte Carlo algorithm for nonequilibrium systems. Heinz Peter Breuer, Wolfgang
Huber, and Francesco Petruccione. Physical Review E, 53:4232–4235, 1996. pdf, url.
[165] The three-loop model: a neural network for the generation of saccadic reaction times.
Burkhart Fischer, Stefan Gezeck, and Wolfgang Huber. Biological Cybernetics, 72:185–196,
1995. pdf, url (27 citations).
[166] The macroscopic limit in a stochastic reaction–diffusion process. Heinz Peter Breuer, Wolfgang Huber, and Francesco Petruccione. Europhysics Letters, 30:69–74, 1995. pdf, url (26
citations).
[167] Fluctuation effects on wave propagation in a reaction–diffusion process. Heinz Peter Breuer,
Wolfgang Huber, and Francesco Petruccione. Physica D, 73:259–273, 1994. pdf, url (49 citations).
[168] Wolfgang Huber. Dynamics of strongly driven open quantum systems. PhD thesis, University
of Freiburg, 1998. pdf.
[169] Wolfgang Huber. The description of reaction diffusion processes by master equations.
Diploma thesis, University of Freiburg, 1994. pdf.