Statistics is concerned with the development and use of mathematical and computational methods for the collection, analysis, and interpretation of data in support of scientific inquiry, informed decision-making, and risk management. It calls on a broad range of tools from probability theory to computer-intensive techniques. The main areas of research by statisticians in the ISM network include
Statistical research is largely motivated by collaboration with other disciplines. It finds applications in many fields, including biology, environmental science, finance and insurance, health sciences, hydrology, market research, and social sciences. With the abundance of very large and complex data sets coming, for example, from the social media and digital processes, financial transactions, astronomy, genomics, meteorology or Big Science like the Giant Hadron Collide, the statistical treatment and analysis of Big Data has become a major challenge of modern statistics.
The statistics program gives an opportunity to graduate students to study in these two major areas of modern statistics. The curriculum allows the students to get well acquainted with the basic elements of mathematical statistics, decision theory and applied statistics. Furthermore, advanced graduate courses can be offered in some more specialized areas.
This program welcomes graduate students with a good background in calculus, mathematical statistics, numerical analysis, and probability (all at the undergraduate level). To get strong training in decision theory and mathematical statistics students should take the basic course in measure and integration (for PhD students) and at least three courses at the intermediate and advanced levels.
This course is an introduction to statistical inference for parametric models. The following topics will be covered:
1. Distribution of functions of several random variables (distribution function and change of variable techniques), sampling distribution of mean and variance of a sample from Normal distribution.
2. Distribution of order statistics and sample quantiles.
3. Estimation: unbiasedness, CramÈr-Rao lower bound and efficiency, method of moments and maximum likelihood estimation, consistency, limiting distributions, delta-method.
4. Sufficiency, minimal sufficiency, completeness, UMVUE, Rao-Blackwell and Lehman-Scheffe theorems.
5. Hypothesis-testing: likelihood-ratio tests.
Text: Introduction to Mathematical Statistics (6th, 7th or 8th Edition), by R.V. Hogg and A.T. Craig, Prentice Hall Inc., 1994.
Recommended reading: (for problems, examples etc) Statistical Inference (2nd Edition), by G. Casella and R. L. Berger, Duxbury, 2002.
Evaluation: Assignments (4), Midterm exam, Final exam.
This course is an introduction to Bayesian modeling in data science and machine learning, demonstrating how to perform inference about hypotheses from data. Topics may include Bayesian decision-making, de Finetti’s representation theorem, Bayesian parametric methods and inference, conjugate models, methods for prior specification and elicitation, hierarchical models, computational approaches to inference, Markov chain Monte Carlo methods (including Metropolis–Hastings), nonparametric Bayesian inference, and Bayesian machine learning. The course will include programming in Python or R.
Régression linéaire. Modèles linéaires généralisés. Méthodes de sélection de variables. Validation de modèles. Modèles mixtes. Équations d'estimation généralisées. Couverture des aspects théoriques et mise en oeuvre pratique avec un logiciel statistique de tous ces modèles et méthodes.
Distribution free procedures for 2-sample problem: Wilcoxon rank sum, Siegel-Tukey, Smirnov tests. Shift model: power and estimation. Single sample procedures: Sign, Wilcoxon signed rank tests. Nonparametric ANOVA: Kruskal-Wallis, Friedman tests. Association: Spearman's rank correlation, Kendall's tau. Goodness of fit: Pearson's chi-square, likelihood ratio, Kolmogorov-Smirnov tests. Statistical software packages used.
Multivariate normal and chi-squared distributions; quadratic forms. Multiple linear regression estimators and their properties. General linear hypothesis tests. Prediction and confidence intervals. Asymptotic properties of least squares estimators. Weighted least squares. Variable selection and regularization. Selected advanced topics in regression. Applications to experimental and observational data.
Distribution theory, stochastic models and multivariate transformations. Families of distributions including location-scale families, exponential families, convolution families, exponential dispersion models and hierarchical models. Concentration inequalities. Characteristic functions. Convergence in probability, almost surely, in Lp and in distribution. Laws of large numbers and Central Limit Theorem. Stochastic simulation.
This course will introduce a range of random graph processes and of random processes on graphs. I intend to cover the following models and topics, time permitting.
Copulas are multivariate distributions whose margins are uniform on the unit interval. They provide a handy tool for the modeling of dependence between variables whose distributions are heterogeneous or involve covariates. This allows in particular for the construction of very versatile dependence models that go beyond the multivariate Gaussian distribution. These models are now extensively used in various applications, e.g., in hydrology, finance, insurance, and risk management. This course will provide an introduction to statistical inference for copula models. Topics include: Sklar's representation theorem; classical copula families; dependence measures; rank-based methods for model estimation, validation and selection; dependence modeling in high-dimensions using vines, hierarchical models and factor copulas; adjustments in the presence of ties.
Tableaux de contingence à plusieurs dimensions. Mesures d'association. Risque relatif, rapport de cote. Tests exacts et asymptotiques. Régression logistique, de Poisson, multinomiale, logistique cumulative. Modèles log-linéaires. Modèles graphiques.
Techniques descriptives. Processus stationnaires. Meilleure prévision linéaire. Modèles ARMA, ARIMA et modèles saisonniers. Estimation et prévision dans les ARMA. Éléments d’analyse spectrale. Modèles ARCH et GARCH.
Fonctions de variables aléatoires, fonction génératrice des moments, quelques inégalités et identités en probabilité, familles de distributions dont la famille exponentielle, vecteurs aléatoires, loi multinormale, espérances conditionnelles, mélanges et modèles hiérarchiques. Théorèmes de convergence, méthodes de simulation, statistiques d'ordre, exhaustivité, vraisemblance. Estimation ponctuelle et par intervalles : construction d'estimateurs et critères d'évaluation, méthodes bayésiennes. Normalité asymptotique et efficacité relative asymptotique.
Espérance conditionnelle. Prédiction. Modèles statistiques, familles exponentielles, exhaustivité. Méthodes d'estimation: maximum de vraisemblance, moindres carrés etc. Optimalité: estimateurs sans biais à variance minimum, inégalité de l'information. Propriétés asymptotiques des estimateurs. Intervalles de confiance et précision. Éléments de base de la théorie des tests. Probabilité critique, puissance en relation avec la taille d'échantillon. Relation entre tests et intervalles de confiance. Tests pour des données discrètes.
Étude des distributions échantillonnales classiques: T2 de Hotelling; loi de Wishart; distribution des valeurs et des vecteurs propres; distribution des coefficients de corrélation. Analyse de variance multivariée. Test d'indépendance de plusieurs sous-vecteurs. Test de l'égalité de matrices de covariance. Sujets spéciaux.
Nombre aléatoire. Simulation de lois classiques. Méthodes d'inversion et de rejet. Algorithmes spécifiques. Simulation des chaines de Markov à temps discret et continu. Solution numérique des équations différentielles ordinaires et stochastiques. Méthode numérique d'Euler et de Runge-Kutta. Formule de Feynman-Kac. Discrétisation. Approximation faible et forte, explicite et implicite. Réduction de la variance. Analyse des données simulées. Sujets spéciaux.
Théorie et application des méthodes classiques d'analyse de données multivariées : analyse en composantes principales, réduction de la dimensionnalité, analyse des correspondances binaire et multiple, analyse discriminante, classification hiérarchique, classification non hiérarchique, choix optimal du nombre de classes. Initiation aux réseaux de neurones artificiels. Utilisation de logiciels statistiques pour le traitement des données.
This is an introductory course on linear time series models, as well as model estimation and prediction techniques for such series. Both frequency domain and time domain techniques are considered.
Lévy Processes are stochastic processes with stationary independent increments. They are often used to describe random phenomena with fluctuations involving jumps.
In this course, we will mainly introduce Lévy Processes with one-sided jumps and the associated fluctuation theory. The following topics will be covered: Lévy-Ito decomposition, subordinators, exponential martingale and Esscher transform, scale functions, solution to the exit problems, potential measures, Wiener-Hopf factorization, reflected Lévy processes and the associated excursion processes. If time allows, we will also briefly introduce applications of such Lévy processes in population models and in risk theory. There will be no exam for this course. Each student is expected to give a short presentation on related topics.
Emphasis is on the probabilistic aspects (stochastic processes) although some estimation (inference) questions will also be discussed.
Thèmes choisis parmi les suivants : analyse exploratoire de données; rééchantillonnage (« jackknife », « bootstrap »); lissage (estimation de densité), régression non paramétrique, « splines »; optimisation (problèmes de maximisation), algorithme espérance maximisation (EM); méthodes de Monte Carlo (introduction, intégration, optimisation).
Exponential families, link functions. Inference and parameter estimation for generalized linear models; model selection using analysis of deviance. Residuals. Contingency table analysis, logistic regression, multinomial regression, Poisson regression, log-linear models. Multinomial models. Overdispersion and Quasilikelihood. Applications to experimental and observational data.
Simple random sampling, domains, ratio and regression estimators, superpopulation models, stratified sampling, optimal stratification, cluster sampling, sampling with unequal probabilities, multistage sampling, complex surveys, nonresponse.
Sufficiency, minimal and complete sufficiency, ancillarity. Fisher and Kullback-Leibler information. Elements of decision theory. Theory of estimation and hypothesis testing from the Bayesian and frequentist perspective. Elements of asymptotic statistics including large-sample behaviour of maximum likelihood estimators, likelihood-ratio tests, and chi-squared goodness-of-fit tests.
Distributions elliptiques. Estimateurs de localisation et dispersion. Estimateur robuste. Corrélations multiple, partielle, canonique. Tests paramétriques, de permutation, du bootstrap. Classification. Analyse en composantes principales. Prévision.
Principes d'inférence : estimation ponctuelle, distribution des estimateurs, test d’hypothèse, région de confiance. Approche bayésienne. Méthodes de rééchantillonnage. Estimation non paramétrique. Applications modernes de la statistique.
Fundamental problems of machine learning in the continuous space: modeling density (energy-based models), modeling samplers (Variational Auto-Encoders), and modeling both (normalizing flows). The limit of the infinite number of generation steps (continuous normalizing glows and diffusion models). Modeling discrete distributions (discrete diffusion models and auto-regressive models).
Processus stochastiques (généralités). Description et caractéristiques des séries chronologiques. Transformées de Fourier. Analyse statistique des séries chronologiques. Analyse spectrale des processus linéaires. Lissage des estimateurs spectraux.
Régression linéaire, modèles mixtes, modèles linéaires généralisés, régression de copule, méthodes non paramétriques, sélection de modèle, erreurs de mesure, données manquantes. Utilisation du logiciel R.
Théorie des modèles linéaires généraux. Théorie des modèles linéaires généralisés. Régression logistique. Modèles log-linéaires.
Ce cours introduit les fondements théoriques de l’apprentissage statistique en présentant la notion de risque et de fonctions de perte, la complexité des modèles, les méthodes d’optimisation ainsi que les principes de stabilité et de régularisation. Il vise à offrir à chaque personne étudiante une compréhension solide des mécanismes qui gouvernent la performance des méthodes modernes d’apprentissage et à proposer une perspective unifiée sur plusieurs modèles classiques.