Program

Daily Schedule

9:00-10:30 == Lectures
10:30-10:45 == Coffee break
10:45-12:00 == Lectures
12:00-13:30 == Lunch
13:30-15:00 == Lectures
15:00-15:15 == Coffee break
15:15-16:30 == Lectures

Monday, May 6

All day == The propensity score as a tool for causal inference

Tuesday, May 7

Morning == Bayesian inference and Markov Chain Monte Carlo
Afternoon == Bayesian disease mapping

Wednesday, May 8

Morning == Misclassification in Health Care Data
Afternoon == Individual patient data meta analysis

Thursday, May 9

Morning == Data analysis using penalized regression methods
Afternoon == Statistical approaches to adaptive treatment strategies

Course Descriptions

Title: The propensity score as a tool for causal inference

Instructors: David A. Stephens and Erica E. M. Moodie

In this full-day course, fundamental principles of causal inference, with a particular focus on the propensity score and covariate balance, will be covered. Regression, matching, and weighting techniques will be discussed, and exercises will be used to provide practical analytic experience.

Title: Bayesian inference and Markov Chain Monte Carlo

Instructor: David A. Stephens

This half-day course will introduce students to applied techniques in Bayesian inference, with a focus on Bayesian solutions to classical statistical problems such as regression and hierarchical modelling, and more advanced topics such as spatial and non-parametric modelling. It will also demonstrate how sampling-based computational solutions, such as those provided by Markov chain Monte Carlo, can be obtained when exact or analytically tractable solutions cannot be obtained easily.

Title: Bayesian disease mapping

Instructor: Alexandra M. Schmidt

The mapping of disease incidence and prevalence has long been a part of public health, epidemiology, and the study of disease in human populations (Koch 2005). This short course provides an introduction to Bayesian disease mapping. I will start by describing some exploratory tools to analyze areal data, and introduce the conditional autoregressive (CAR) model. CAR specifications are commonly used as latent structures in hierarchical models for areal level data. I will also discuss alternative models to the CAR structure, and introduce a couple of packages available in R to fit such models. Examples include the analysis of the number of cases of dengue fever across the districts of the city of Rio de Janeiro, and the number of cases of diabetes across neighbourhoods of Montreal as a function of an estimated neighbourhood soda-consumption index.

Title: Misclassification in Health Care Data

Instructor: Robert Platt

Administrative health care data are an important resource for research in health services, pharmacoepidemiology, and clinical medicine. The course will describe the different sources of administrative data, differentiating electronic medical record data from health care claims data, and discuss some of the statistical challenges involved in using these data. Misclassification is an inherent problem in these data exposures and outcomes are typically determined using a list of claims. This misclassification can affect study results. Topics discussed include:

Description of data sources
Types of questions typically addressed using administrative data (description, prediction, estimation of causal effects)
Data structure and management
Misclassification using administrative codes
Estimation methods that correct for misclassification.

Title: Individual patient data meta analysis

Instructor: Andrea Benedetti

Individual patient data meta analyses (IPD-MA) are considered the gold standard for evidence synthesis, and have been conducted to address important questions in many domains. The course will describe collecting IPD-MA, differentiating aggregate data meta analysis from IPD-MA, and discuss some of the statistical challenges involved in using these data. Students will learn core analytic skills for working with these data. Using examples from treatment of drug resistant tuberculosis and depression screening, we will discuss several key aspects of analyses of these data. Students will be provided with sample data and code. Topics include:

Assembling IPD-MA
Types of questions typically addressed using IPD-MA
Data structure and management
Key challenges and opportunities
Analysis methods/worked examples

Title: High-dimensional data analysis using penalized regression methods

Instructor: Sahir Rai Bhatnagar

In high-dimensional (HD) data, where the number of covariates (p) greatly exceeds the number of observations (n), estimation can benefit from the bet-on-sparsity principle, i.e., only a small number of predictors are relevant in the response. This assumption can lead to more interpretable models, improved predictive accuracy, and algorithms that are computationally efficient. In medical data, where the sample sizes are particularly small due to high data collection costs, we must often assume a sparse model because there isn’t enough information to estimate p parameters. For these reasons, penalized regression methods have generated substantial interest over the past decade since they can set model coefficients exactly to zero. We will provide an overview of the lasso and group-lasso; two of the most popular penalized regressions techniques available. We will provide details on both the theoretical and computational aspects of these methods and demonstrate a real-data example with R code.

Title: Statistical approaches to adaptive treatment strategies

Instructor: Erica E.M. Moodie

Precision medicine, in which treatments are tailored to evolving patient characteristics, is an area of growing interest in statistics, computer science, and clinical medicine. In this presentation, I will outline the motivation for the individualization of treatment, and present an overview of the current analytic tools in a single-stage setting, to provide a foundation for extending to the multi-stage treatment setting more commonly implemented for the care of chronic conditions.

In the first part of the course, I will provide an introduction to precision medicine and the motivation for adaptive treatment strategies, including examples of tailored treatments. In the second half, I will focus on methods of estimation, specifically considering three regression-based approaches (Q-learning, G-estimation, and dynamic weighted ordinary least squares (dWOLS), as well as two value-search approaches (marginal structural models and outcome-weighted learning (OWL)).

McGill Summer School in Health Data Analytics