Analysis of Administrative Health Care Data
States and events: an orientation to rates, risks, and hazards
James A. Hanley
Flexible modeling of survival data: challenges, methods and applications
A Tutorial on ADMM algorithms
An Introduction to Bayesian Inference and MCMC
David A. Stephens
An introduction to causal inference and propensity score methods
David A. Stephens and Erica E. M. Moodie
Analysis of spatially structured data
Alexandra M. Schmidt
Description of courses
Analysis of Administrative Health Care Data
Presenter: Robert Platt
Administrative health care data are an important resource for research in health services, pharmacoepidemiology, and clinical medicine. The course will describe the different sources of administrative data, differentiating electronic medical record data from health care claims data, and discuss some of the statistical challenges involved in using these data. Students will learn core analytic skills for working with these data. Using examples from pharmacoepidemiology and a realistic simulation of a health care claims database, we will discuss several key aspects of analyses of these data. Students will be provided with sample data and code. Topics include:
-Description of data sources
-Types of questions typically addressed using administrative data (description, prediction, estimation of causal effects)
-Data structure and management
-Analysis methods/worked examples (e.g., estimation using high-dimensional propensity scores).
States and events: an orientation to rates, risks, and hazards.
Presenter: James A. Hanley
I will begin with the fundamental scientific concept of ‘state’ (exemplified by the term ‘status’ in a popular social media network) and introduce the statistical parameters and distributions to describe and compare the speeds with which events (transitions between states) occur. To do so, I will use historical and contemporary examples from demography, industrial testing, and medical and epidemiological research. The statistical techniques used to address these research questions are often referred to as ‘survival’ analysis — an overly-narrow term that misses the unity that comes from handling ‘censored’/‘interval’ data of any type by likelihood methods and by conditioning. I will describe some of the subtleties involved in comparisons of time durations or event rates, and some embarrassing/serious statistical blunders. I will explain how most of the statistical information in large databases can be extracted using a ‘smart-sampling’ approach widely used in epidemiology/economics. In case circumstances do not permit us to enact a live version of the ‘Kaplan-Meier Theatre’ (Gerds, 2016), please have read his article ahead of time. See also http://www.biostat.mcgill.ca/hanley, and in particular the link to the ‘Bridge of Life’ material. Several articles relevant to the lecture can be found using the terms ‘longevity’, ‘Titanic’, ‘Oscar’, ‘tumbler’, ’avalanche’ ‘HIV’, ‘HPV’, and ‘screening’ in the search box in JH’s home page.
Title: Flexible modeling of survival data: challenges, methods and applications
Presenter: Michal Abrahamowicz
Cox’s PH model is one of the most popular statistical methods, with > 20,000 references in medical literature alone. Yet, most users are not aware of the underlying assumptions and do not realize the implications of the potential violation of these assumptions. Recent statistical research in this area provides an excellent opportunity to both introduce students to more advanced modeling techniques and illustrate how applications of the state-of-the-art statistical methods may yield new insights into complex processes studied in clinical and public health research.
Survival or time-to-event analysis focuses on dynamic processes that evolve over time. The class will focus on two aspects of time-related changes that require more advanced methods. Firstly, we will deal with associations that vary over time. To this end, we will explore the reasons for frequent violation of the PH assumption, that postulates that the effects of risk factors remain constant over time, and introduce the flexible methods that address this issue, while accounting for potential non-linear effects of continuous variables. Secondly, we will introduce the concept of time-varying covariates, and explain the challenges in modeling the effects of variables that change their values during the study period. Then, we will introduce flexible methods for modeling time-varying covariates. Simulation results will be used to assess the performance of the proposed methods. Guidance regarding use of R programs that implement these methods will be also provided.
The practical relevance of the flexible methods will be illustrated using real-life examples of prediction of survival after lung cancer diagnosis and septic shock, and adverse effects of medications.
Title: A Tutorial on ADMM algorithms
Presenter: Yi Yang
The alternating direction method of multipliers (ADMM) has been regarded as a very flexible approach for solving large-scale sparse feature learning problems efficiently. When the number of observations of the data size and/or the feature dimension is large, a consensus (distributed) version of ADMM might be used, which is capable of distributing the computation task and the data set to multiple computing nodes. In this short course, I will give a tutorial on the ADMM algorithm and its applications in solving sparse learning models.
Title: An Introduction to Bayesian Inference and MCMC.
Presenter: David A. Stephens
This module will give an introduction to applied Bayesian inference techniques, focussing on Bayesian solutions to classical statistical problems, including regression and hierarchical modelling, and more advanced topics such as spatial and non-parametric modelling. It will also demonstrate how sampling-based computational solutions, such as those provided by Markov chain Monte Carlo, can be obtained when exact or analytically tractable solutions cannot be obtained easily.
Title: An introduction to causal inference and propensity score methods
Presenters: David A. Stephens and Erica E. M. Moodie
Causal inference attempts to uncover the structure of the data and eliminate all non-causative explanations for an observed association. Most inference problems in biostatistics seek to uncover causal relationships, which is hindered by issues such as confounding in non-experimental data or non-compliance in randomized studies. This workshop will introduce fundamental principles in causal inference, with a particular focus on the propensity score and covariate balance.
Title: Analysis of spatially structured data
Presenter: Alexandra M. Schmidt
This course aims at giving an introduction to spatial modelling of point referenced and areal level data under the Bayesian paradigm. The course is divided into 3 parts. The first will introduce spatially referenced data, define Gaussian processes and discuss stationarity and isotropy, then define variograms, and the most used correlation functions. Bayesian kriging will also be discussed therein. The second part will introduce modelling of areal data and disease mapping. In the third part, recent topics in spatial statistics, like spatial confounding, modeling of skewed processes will be discussed. We will use NIMBLE software to analyze several real data sets.