Short Courses (May 31 and June 1, 2025)


A Short Course on Optimization for Data Science and Machine Learning Problems

Instructor: George Michailidis

Abstract: Optimization lies at the heart of modern data science, offering scalable solutions for high-dimensional problems in statistics and machine/deep learning. The first part of the course will cover: (i) the fundamentals of gradient-based optimization and (ii) advanced optimization methods. These algorithms will be illustrated through applications in high-dimensional statistics and machine learning, including sparse regression, matrix completion, graphical models and feed-forward neural networks. The second part will explore key recent developments in optimization driven by challenges in machine and deep learning. It will briefly cover: (i) Federated and distributed learning, where decentralized optimization techniques enable efficient model training across multiple devices while preserving data privacy. (ii) Minimax optimization, a powerful framework for adversarial learning, robust statistics, and generative modeling. (iii) Bilevel optimization, which has gained prominence in the last 2-3 years for applications such as hyperparameter tuning, meta-learning, and reinforcement learning. The course will balance core concepts with sufficient technical depth, providing an accessible yet insightful perspective on the latest advances in optimization.

Course Outline: Part I: (a) Fundamentals of Gradient-Based Optimization
  • Basics of optimization: introduction to unconstrained and constrained optimization problems
  • Gradient Descent
  • Stochastic Gradient Descent
  • Accelerating Gradient Descent - Momentum Methods
(b) Advanced optimization methods
  • Proximal methods
  • Splitting methods (e.g., ADMM)
  • Block coordinate methods
Part II: (a) Fundamentals of Distributed and Federated Learning
  • Introduction to distributed optimization
  • Federated Learning basics
  • Stochastic and Asynchronous Methods Stochastic Gradient Descent in distributed settings; asynchronous and heterogeneous updates; variance reduction and adaptive methods
(b) Minimax Optimization and Adversarial Learning
  • Optimization algorithms for Minimax problems. Gradient Descent-Ascent; optimistic and Extra-Gradient methods; convergence analysis and stability
  • Applications in Deep Learning. Generative Adversarial Networks (GANs); Robust optimization and adversarial training; domain adaptation and fairness in AI
(c) Bilevel Optimization in Deep Learning
  • Motivation and problem formulation
  • Algorithms for bilevel optimization. Approximate gradient methods; Implicit differentiation and hypergradient methods computational complexity and scalability
  • Applications in Deep Learning Hyperparameter optimization Neural Architecture Search (NAS) Meta-Learning and AutoML

Statistical Methods for Composite Time-to-Event Outcomes: Win Ratio and Beyond

Instructor: Dr. Lu Mao (University of Wisconsin–Madison)

Lu Mao Lu Mao is an Associate Professor in the Department of Biostatistics and Medical Informatics at the University of Wisconsin–Madison. He joined the department as an Assistant Professor after earning his PhD in Biostatistics from UNC Chapel Hill in 2016. His research interests include survival analysis, particularly composite outcomes, as well as causal inference, semiparametric theory, and clinical trials. He is currently the principal investigator of an NIH R01 grant on statistical methodology for composite time-to-event outcomes in cardiovascular trials and an NSF grant on causal inference in randomized trials with noncompliance. Beyond methodological research, he collaborates with medical researchers in cardiology, radiology, oncology, and health behavioral interventions, where time-to-event and longitudinal data are routinely analyzed. He has also taught several short courses on statistical methods for composite outcomes to broad audiences, including a recent one at the 2024 Joint Statistical Meetings (JSM) in Portland, OR.

Course Outline

  1. Introduction: Real examples, regulatory guidelines, traditional methods.
  2. Hypothesis testing: Standard win ratio and properties, sample size calculation, Max-combo test, software and case studies.
  3. Nonparametric estimation: Time restricted effect sizes (WR, net benefit, win odds), software and case studies.
  4. Semiparametric regression: Proportional win-fractions regression, model diagnostics, variable selection and prediction, time restricted effect sizes (WR, net benefit, win odds), software and case studies.
  5. Miscellaneous: Meta-analysis, group-sequential designs.

Statistical tests for bioequivalence and biosimilarity

Instructor: Dr. Elena Rantou (FDA)

Elena Rantou Elena Rantou, PhD, is a Master Scientist in the Office of Biostatistics/OTS/CDER. She joined FDA in 2013. Since 2019 she is a lead mathematical statistician working with generic and biosimilar products. Her research work is mainly focused on assessing bioequivalence of topical/dermatological generic products, characterizing outliers in replicate PK studies, the detection of data anomalies and the use of AI/ML in drug development. She has contributed to various working groups and worked towards guidance development. She is part of the leadership of the FDA Modeling and Simulation working group and co-chairs the AI/ML and the digital health technologies (DHT) Regulatory Review Committee in the Office of Biostatistics. Elena holds a PhD from American University, Washington, DC and prior to joining the FDA, she worked in the academia and as a statistical consultant for over 15 years.

Abstract: Statistical testing for bioequivalence plays a crucial role in the regulatory approval of generic drugs, ensuring that they have the same rate and extent of absorption as a reference drug. It is also used to confirm that a follow-on therapeutic biologic product, like a biosimilar monoclonal antibody, is highly similar to its reference biologic, with no clinically meaningful differences. This half-day course will cover various types of bioequivalence studies, including in-vivo pharmacokinetic (PK) studies, comparative clinical endpoint studies, and in-vitro studies. We will explore the different statistical tests applicable to each type of study, addressing both continuous and discrete endpoints. These concepts will be explained in theory and illustrated with examples from approved marketing applications. Furthermore, challenges encountered during the review of these studies have led to the development of advanced regulatory statistical methodologies. These challenges include issues like outliers, statistical power, sample size considerations, study design, and variability in drug performance. The course will highlight these challenges and demonstrate how these are addressed so that bioequivalence assessments are both accurate and reliable.


Statistical Methods for Time-to-Event Data from Multiple Sources: A Causal Inference Perspective

Instructor: Dr. Xiaofei Wang (Duke University) and Dr. Shu Yang (North Carolina State University)

Xiaofei Wang Xiaofei Wang is a Professor of Biostatistics and Bioinformatics at Duke University School of Medicine, and the Director of Statistics for Alliance Statistics and Data Management Center. Dr. Wang has been involved in clinical trials, observational studies, and translational studies for Alliance/CALGB and Duke Cancer Institute. His methodology research has been funded by NIH with a focus on biased sampling, causal inference, survival analysis, methods for predictive and diagnostic medicine, and clinical trial design. He is an Associate Editor for Statistics in Biopharmaceutical Statistics and an elected fellow for the American Statistical Association (ASA).

Shu Yang Shu Yang is an Associate Professor of Statistics, Goodnight Early Career Innovator, and University Faculty Scholar at North Carolina State University. Her primary research interest is causal inference and data integration, particularly with applications to comparative effectiveness research in health studies. She also works extensively on methods for missing data and spatial statistics. Dr. Yang has been a Principal Investigator for the U.S. NSF, NIH, and FDA research projects. She is one of the recipients of the COPPS Emerging Leader award.

Abstract: The short course will review important statistical methods for survival data arising from multiple data sources, including randomized clinical trials and observational studies. It consists of four parts, all of which will be discussed in a unified causal inference framework. In each part, we will review the theoretical background. Supplemented with data examples, we will emphasize the application of these methods in practice and their implementation in freely available statistical software. Each part takes approximately two hours to cover.

Part 1: (Instructor: Xiaofei Wang)
In Part 1, we will review key issues and methods in designing randomized clinical trials (RCTs). Statistical tests, such as logrank test and its weighted variants, inference for hazard ratio with Cox proportional hazards (PH) model, and the causal estimand based on survival functions (e.g. restricted mean survival difference), will be discussed. Examples and data from cancer clinical trials will be used to illustrate these methods. In addition, standard survival analysis methods, such as Kaplan-Meier estimator, logrank test, Cox PH models, have been commonly used to analyze survival data arising from observational studies, in which treatment groups are not randomly assigned as in RCTs. We will start by introducing the statistical framework and causal inference, then shift the focus to the causal inference methods for survival data. We will review various methods that allow valid visualization and testing for confounder-adjusted survival curves and RMST differences, including G-Formula, Inverse Probability of Treatment Weighting, Propensity Score Matching, calibration weighting, Augmented Inverse Probability of Treatment Weighting. Examples and data from cancer observational studies will be used to illustrate these methods.

Part 2: (Instructor: Shu Yang)
In Part 2, we will cover the objectives and methods that allow integrative data analyses from RCTs and observational studies. These methods exploit the complementing features of RCTs and observational studies to estimate the average treatment effect (ATE), heterogeneity of treatment effect (HTE), and individualized treatment rules (ITRs) over a target population. Firstly, we will review existing statistical methods for generalizing RCT findings to a target population, leveraging the representativeness of the observational studies. Due to population heterogeneity, the ATE and ITRs estimated from the RCTs lack external validity/generalizability to a target population. We will review the statistical methods for conducting generalizable RCT analysis for the targeted ATE and ITRs, including inverse probability sampling weighting, calibration weighting, outcome regression, and doubly robust estimators. R software and applications will also be covered. Secondly, we will review existing statistical methods for integrating RCTs and observational studies for robust and efficient estimation of the HTE. RCTs have been regarded as the gold standard for treatment effect evaluation due to randomization of treatment, which may be underpowered to detect HTEs due to practical limitations. On the other hand, large observational studies contain rich information on how patients respond to treatment, which may be confounded. We will review statistical methods for robust and efficient estimation of the HTE leveraging the treatment randomization in RCTs and rich information in observational studies, including test-based integrative analysis, selective borrowing, and confounding function modeling. R software and applications will also be covered.

Learning Strategy
The course material will blend concepts, methods, and real-data applications. It will also describe how to implement the methods using R packages.

Pre-requisites
Attendees are expected to be familiar with survival analysis and some concepts of causal inference, but a deep understanding of the general principles of causal inference is not required.


From Estimands to Robust Inference of Treatment Effects in Platform Trials

Instructor: Dr. Ting Ye (UW)

Ting Ye Ting Ye is an Assistant Professor in Biostatistics at the University of Washington. Her research aims to accelerate human health advances through data-driven discovery, development, and delivery of clinical, medical, and scientific breakthroughs, spanning the design and analysis of complex innovative clinical trials, causal inference in biomedical big data, and quantitative medical research. Ting is a recipient of the School of Public Health's Genentech Endowed Professorship and the NIH Maximizing Investigators' Research Award (MIRA). Ting is a leader in covariate adjustment for randomized clinical trials. She has published over ten papers, including four in top-tier journals such as JASA, JRSSB, and Biometrika, two of which have been cited in the FDA's official guidance. The RobinCar R package, developed by her research group, has become a standard software tool in the field. She is also the co-founder and co-chair of an ASA Biopharmaceutical Section Scientific Working Group on Covariate Adjustment.

Abstract: A platform trial is an innovative clinical trial design that uses a master protocol (i.e., one overarching protocol) to evaluate multiple treatments in an ongoing manner and can accelerate the evaluation of new treatments. However, its flexibility introduces inferential challenges, with two fundamental ones being the precise definition of treatment effects and robust, efficient inference on these effects. Central to these challenges is defining an appropriate target population for the estimand, as the populations represented by some commonly used analysis approaches can arbitrarily depend on the randomization ratio or trial type. In this short course, we will first establish a clear framework for constructing clinically meaningful estimands with precise specification of the population of interest. In particular, we introduce the concept of the Entire Concurrently Eligible (ECE) population, which preserves the integrity of randomized comparisons while remaining invariant to both the randomization ratio and trial type. This framework provides a solid foundation for future design, analysis, and research in platform trials. Next, we will present weighting and post-stratification methods for estimation of treatment effects with minimal assumptions. To fully leverage the efficiency potential of platform trials, we will also present model-assisted approaches for baseline covariate adjustment to gain efficiency while maintaining robustness against model misspecification. Additionally, we will discuss and compare the asymptotic distributions of the proposed estimators and introduce robust variance estimators. Throughout the course, we will illustrate these concepts and methods through case studies and demonstrate their implementation using the R package RobinCID.

Course Outline

  1. Background and overview of key concepts: 20min
  2. Estimand and case studies: 40min
  3. Methods for continuous and discrete endpoints: 40min
  4. Introduction of R package and demonstration (part 1): 20min
  5. Break: 10min
  6. Methods for survival endpoints: 35min
  7. Introduction of R package and demonstration (part 2): 15min

Statistics Meets Tensors: Methods, Theory, and Applications

Instructor: Dr. Anru Zhang (Duke)

Anru Anru Zhang is a primary faculty member jointly appointed by the Department of Biostatistics & Bioinformatics and the Departments of Computer Science and the Eugene Anson Stead, Jr. M.D. Associate Professor at Duke University. He obtained his bachelor's degree from Peking University in 2010 and his Ph.D. from the University of Pennsylvania in 2015. His work focuses on high-dimensional statistical inference, tensor learning, generative models, and applications in electronic health records and microbiome data analysis. He won the IMS Tweedie Award, the COPSS Emerging Leader Award, and the ASA Gottfried E. Noether Junior Award. His research is currently supported by two NIH R01 Grants (as PI and MPI) and an NSF CAREER Award.

Abstract:
High-dimensional high-order/tensor data refers to data organized in the form of large-scale arrays spanning three or more dimensions, which becomes increasingly prevalent across various fields, including biology, medicine, psychology, education, and machine learning. Specifically, tensor data is prevalent in biological and medical research, playing a crucial role in various studies. For instance, in longitudinal microbiome research, microbiome samples are collected from multiple subjects (units) at multiple time points to analyze the abundance of bacteria (variables) over time. Depending on the taxonomic level under investigation, there can be hundreds or thousands of bacterial taxa in the feature mode, with many taxa exhibiting strong correlations in their abundance patterns. In the field of neurological science, techniques like Magnetic Resonance Imaging (MRI), functional MRI, and electroencephalogram (EEG) have been developed to measure neurological activities in three-dimensional brain regions. These imaging data are often stored in the form of tensors.

Compared to low-dimensional or low-order data, the distinct characteristics of high-dimensional high-order data poses unprecedented challenges to the statistics community. For the most part, classical methods and theory tailored to matrix data may no longer apply to high-order data. While previous studies have attempted to address this issue by transforming high-order data into matrices or vectors through vectorization or matricization, this paradigm often leads to loss of intrinsic tensor structures, and as a result, suboptimal outcomes in subsequent analyses. Another major challenge stems from the computational side, as the high-dimensional high-order structure introduces computational difficulties unseen in the matrix counterpart. Many fundamental concepts and methods developed for matrix data cannot be extended to high-order data in a tractable manner; for instance, naive extensions of concepts such as operator norm, singular values, and eigenvalues all become NP-hard to compute.

From a methodology perspective, with the rapid expansion of tensor datasets, fundamental statistical analysis tools, such as dimension reduction, regression, classification, discrimination analysis, and clustering, face unique aspects and significant challenges compared to traditional statistics. These difficulties arise from both statistical and computational perspectives, giving rise to the ubiquitous phenomenon of computational-statistical tradeoffs. Given these challenges and the growing importance of tensor data analysis, we are offering the short course "Statistics meets Tensors: Methodology, Theory, and Applications" the New England Statistics Symposium (NESS) 2025.

Tentative Course Outline

  1. Introduction: Background, applications of tensor methods on data in a tensor format, applications of tensor methods on data in other formats.

  2. Tensor algebras: Concepts (order, fibers, slices, norms, etc), Rank-one tensors, tensor decomposition, tensor rank, uniqueness of tensor decomposition: Kruskal condition.

  3. Probabilistic model of tensors: Tensor regression, tensor SVD, tensor completion,high-order clustering.

  4. Methods: Power iteration, higher-order orthogonal iteration, importance sketching.

  5. Applications: Computational Imaging, high-dimensional longitudinal data, multivariate relational data.

  6. Theory: Information-theoretic limits, computational-statistical trade-offs.