Three One-day Short-Courses at
the 29th New England Statistics Symposium
Friday, April 24, 2015, University of Connecticut
8:30am — 5pm, at Student Union Room TBA
To register, please visit
To know more, please visit
Course 1: Bayesian Biostatistics: Design of Clinical Trials and Subgroup Analysis
Instructor Dr. Peter M¨
uller is Professor of Statistics and Mathematics at the University of Texas, Austin. He works on Bayesian inference, with a focus on nonparametric
Bayesian methods, simulation based methods, optimal design and multiple comparison
procedures. He is interested in applications in biostatistics and bioinformatics, including in particular Bayesian clinical trial design, hierarchical models, population PK/PD
models, inference for histone modifications and tumor heterogeneity. Dr. M¨
uller is a
Fellow of the ASA, a Fellow of the IMS, and served as president of ISBA.
Outline This shortcourse is an introduction to Bayesian inference methods that are
commonly used in biomedical applications, with a focus on two important problems:
Bayesian adaptive clinical trial design and subgroup analysis. The course is organized in 4 classes: 1)
Review of Bayesian inference, including basics of MCMC, posterior asymptotics and frequentist operating
characteristics Linear regression and hierarchical models. 2) Clinical trial design (phase I and II): CRM,
TITE-CRM and more. 3) Clinical trial design (phase II): predictive probability designs, proper Bayes designs,
adaptive randomization and decision theoretic designs. 4) Subgroup analysis and multiplicities: how posterior
inference adjusts for multiplicities, but solves only half the problem. The emphasis will be on data analysis
and practical implementation issues.
Prerequisites The course is accessible to anyone with a knowledge of statistical inference at the level of
introductory graduate level courses in mathematical statistics and probability. An appreciation of statistical
inference in biomedical research is desirable, but not strictly required. An important part of the course are
problem sets which students are asked to work independently (outside the course). Familiarity with R or a
comparable computing environment is essential for the problem sets (but not for the lecture). Working the
problem sets is important for an optimal learning experience, but is not part of the lectures.
Course 2: Modern Multivariate Statistical Learning: Methods and Applications
Instructors Dr. Kun Chen is Assistant Professor, and Dr. Jun
Yan is Associate Professor in the Department of Statistics, University of Connecticut. Dr. Chen is interested in multivariate analysis, dimension reduction, variable selection, time series analysis and
statistical computing, with a focus on analyzing large-scale multivariate data. He has experience working on a variety of statistical
applications in ecology, genetics, medical imaging, and health sciences. Dr. Yan works on multivariate dependence, survival analysis,
clustered data analysis, spatial data analysis, spatial extremes, estimating functions, and statistical computing. He is committed to making his statistical methods available via
open source software and has authored and is actively maintaining a collection of R packages in the public
Outline This short course focuses on the state-of-art developments in multivariate statistical learning, which
exemplify the successful marriage between statistical modeling and optimization. It targets many applications
in various fields where the essential goal is to decode the underlying associations between/within a possibly
large number of features and outcomes. The challenges in dealing with noisy multivariate data of high
dimensionality/large volume have pushed a genuine refinement and expansion of the classical multivariate
analysis toolkit. Several classes of multivariate tools for simultaneous structured dimension reduction and
model estimation will be introduced, in which multiple indispensable data attributes and modeling elements,
e.g., low-rank, sparsity, variable grouping, multi-view data, etc, are seamlessly integrated. Taking into account
such complex structures in an integrative yet manageable way significantly enhances model predictive power,
improves model interpretation, and enables data analysts to gain critical insights from the data. The course
consists of 5 sessions: 1) overview of multivariate learning; 2) principal component analysis and new variants;
3) canonical correlation analysis and new variants; 4) multivariate regression and new variants; and 5) other
multivariate methods and recent developments. Case studies are provided with examples in finance, insurance,
ecology, imaging analysis, genetics, health science, and industrial engineering.
Prerequisites Entry level graduate courses in statistics or exposures to regression and multivariate analysis
are desirable. Participants are encouraged to bring their own laptop computers to the session and to have the
latest versions of R installed on their computers. The participants will have the opportunity to go through
several real data examples and case studies together with the instructors.
Course 3: Boosting R Skills and Automating Statistical Reports
Instructor Dr. Yihui Xie is a software engineer at RStudio, Inc. He is interested in
interactive statistical graphics, statistical computing, and web applications. He is an active
R user and the author of several R packages, such as animation, formatR, Rd2roxygen, and
knitr, among which the animation package won the 2009 John M. Chambers Statistical
Software Award of the ASA. He is also the author of the book “Dynamic Documents with
R and knitr”. In 2006, he founded the Capital of Statistics, which has grown into a large
online community on statistics in China. He initiated the first Chinese R conference in
2008, and has been organizing R conferences in China since then. During his PhD training
at the Iowa State University, he won the Vince Sposito Statistical Computing Award (2011)
and the Snedecor Award (2012) in the Department of Statistics.
Outline This intermediate level short course consists of Part I “R Programming” and Part II “Dynamic
Reporting with R”. Part I aims to improve your R programming skills. It covers some basic and advanced
topics in R, as well as R package development. Part II is a tutorial on two packages for automatic reporting,
knitr (Xie, 2013), and rmarkdown (Allaire et al., 2014). It covers the basic idea of literate programming as
well as its role in reproducible research. A variety of document formats supported by knitr will be introduced,
including R LATEX (.Rnw) and R Markdown (.Rmd). We will show useful features of knitr, such as creating
tables and plots from data, caching, and cross references. We will also provide examples of advanced features
such as chunk hooks, and calling foreign languages (shell scripts, Python, C++, Julia, etc.). Finally we will
introduce the simple Markdown language, as well as how to convert R Markdown to many other document
formats, such as LATEX/PDF, HTML, and Word. Hopefully R Markdown will make it much easier for data
analysts to prepare reports and authors to publish their work related to data analysis. Many people agree
that reproducible research is important (see, for example, the Duke Saga
21528593), but have an impression that it implies more work at the same time. We will show that this is
wrong. Generating reports from knitr dynamic documents is not only a better approach to reproducible
research, but also easier and even fun!
Prerequisites The attendees should have some familiarity with programming (not necessarily with R).
Some prior knowledge of LATEX and HTML can be helpful but not required for this tutorial. Potential attendees
include those who write reports that involve data analysis, ranging from homework, project reports, papers,
books, and websites. Please have the latest R and package rmarkdown, and RStudio (
installed on your laptop. RStudio IDE is optional but recommended. I will be using RStudio for demo
purposes. If you have already been using other text editors such as Emacs + ESS, you are free to stay with
your own choice, and I will explain how things work outside RStudio.