Program
Daily Schedule
 9:0010:30 == Lectures
 10:3010:45 == Coffee break
 10:4512:00 == Lectures
 12:0013:30 == Lunch
 13:3015:00 == Lectures
 15:0015:15 == Coffee break
 15:1516:30 == Lectures
Monday, May 6
 All day == The propensity score as a tool for causal inference
Tuesday, May 7
 Morning == Bayesian inference and Markov Chain Monte Carlo
 Afternoon == Bayesian disease mapping
Wednesday, May 8
 Morning == Misclassification in Health Care Data
 Afternoon == Individual patient data meta analysis
Thursday, May 9
 Morning == Data analysis using penalized regression methods
 Afternoon == Statistical approaches to adaptive treatment strategies
Course Descriptions
Title: The propensity score as a tool for causal inference
Instructors: David A. Stephens and Erica E. M. Moodie
In this fullday course, fundamental principles of causal inference, with a particular focus on the propensity score and covariate balance, will be covered. Regression, matching, and weighting techniques will be discussed, and exercises will be used to provide practical analytic experience.
Title: Bayesian inference and Markov Chain Monte Carlo
Instructor: David A. Stephens
This halfday course will introduce students to applied techniques in Bayesian inference, with a focus on Bayesian solutions to classical statistical problems such as regression and hierarchical modelling, and more advanced topics such as spatial and nonparametric modelling. It will also demonstrate how samplingbased computational solutions, such as those provided by Markov chain Monte Carlo, can be obtained when exact or analytically tractable solutions cannot be obtained easily.
Title: Bayesian disease mapping
Instructor: Alexandra M. Schmidt
The mapping of disease incidence and prevalence has long been a part of public health, epidemiology, and the study of disease in human populations (Koch 2005). This short course provides an introduction to Bayesian disease mapping. I will start by describing some exploratory tools to analyze areal data, and introduce the conditional autoregressive (CAR) model. CAR specifications are commonly used as latent structures in hierarchical models for areal level data. I will also discuss alternative models to the CAR structure, and introduce a couple of packages available in R to fit such models. Examples include the analysis of the number of cases of dengue fever across the districts of the city of Rio de Janeiro, and the number of cases of diabetes across neighbourhoods of Montreal as a function of an estimated neighbourhood sodaconsumption index.
Title: Misclassification in Health Care Data
Instructor: Robert Platt
Administrative health care data are an important resource for research in health services, pharmacoepidemiology, and clinical medicine. The course will describe the different sources of administrative data, differentiating electronic medical record data from health care claims data, and discuss some of the statistical challenges involved in using these data. Misclassification is an inherent problem in these data exposures and outcomes are typically determined using a list of claims. This misclassification can affect study results. Topics discussed include:

Description of data sources

Types of questions typically addressed using administrative data (description, prediction, estimation of causal effects)

Data structure and management

Misclassification using administrative codes

Estimation methods that correct for misclassification.
Title: Individual patient data meta analysis
Instructor: Andrea Benedetti
Individual patient data meta analyses (IPDMA) are considered the gold standard for evidence synthesis, and have been conducted to address important questions in many domains. The course will describe collecting IPDMA, differentiating aggregate data meta analysis from IPDMA, and discuss some of the statistical challenges involved in using these data. Students will learn core analytic skills for working with these data. Using examples from treatment of drug resistant tuberculosis and depression screening, we will discuss several key aspects of analyses of these data. Students will be provided with sample data and code. Topics include:

Assembling IPDMA

Types of questions typically addressed using IPDMA

Data structure and management

Key challenges and opportunities

Analysis methods/worked examples
Title: Highdimensional data analysis using penalized regression methods
Instructor: Sahir Rai Bhatnagar
In highdimensional (HD) data, where the number of covariates (p) greatly exceeds the number of observations (n), estimation can benefit from the betonsparsity principle, i.e., only a small number of predictors are relevant in the response. This assumption can lead to more interpretable models, improved predictive accuracy, and algorithms that are computationally efficient. In medical data, where the sample sizes are particularly small due to high data collection costs, we must often assume a sparse model because there isn’t enough information to estimate p parameters. For these reasons, penalized regression methods have generated substantial interest over the past decade since they can set model coefficients exactly to zero. We will provide an overview of the lasso and grouplasso; two of the most popular penalized regressions techniques available. We will provide details on both the theoretical and computational aspects of these methods and demonstrate a realdata example with R code.
Title: Statistical approaches to adaptive treatment strategies
Instructor: Erica E.M. Moodie
Precision medicine, in which treatments are tailored to evolving patient characteristics, is an area of growing interest in statistics, computer science, and clinical medicine. In this presentation, I will outline the motivation for the individualization of treatment, and present an overview of the current analytic tools in a singlestage setting, to provide a foundation for extending to the multistage treatment setting more commonly implemented for the care of chronic conditions.
In the first part of the course, I will provide an introduction to precision medicine and the motivation for adaptive treatment strategies, including examples of tailored treatments. In the second half, I will focus on methods of estimation, specifically considering three regressionbased approaches (Qlearning, Gestimation, and dynamic weighted ordinary least squares (dWOLS), as well as two valuesearch approaches (marginal structural models and outcomeweighted learning (OWL)).