FULL-DAY SHORT COURSES | MAY 27, 2026
HYBRID: IN-PERSON (McHugh Hall) & ONLINE OPTIONS AVAILABLE


Bayesian Statistics (Bayesics), Its History, Rationale, and Future in Clinical Trials

Instructor: Yuan Ji, Chicago University

Abstract: Motivated by the recent release of FDA’s draft guidance on the Use of Bayesian Methodology in Clinical Trials of Drug and Biological Product on January 12, 2026, this short course provides a concise introduction of Bayesian statistics (Bayesics), including its history, rationale, and future in clinical trials. The short course will cover the origin of Bayesics as the Bayes’ Theorem, its evolution over the past centuries, and its modern applications in clinical trials. The course will consist of three parts: 1) Bayesian thinking; 2) Bayesian modeling and computation; and 3) Bayesian clinical trials. The first two parts are general discussion of Bayesics, and the last part is dedicated to the discussion on Bayesian trials and their connection to the FDA’s draft guidance. The short course is developed for statisticians who are interested in learning fundamental Bayesian thinking and its connection to clinical trials. For example, what is the difference between type I error and type I error rate, and how does Bayesian quantify the chance of making a type I error. Students are expected to gain deep understanding of Bayesian philosophy and connect it to human logics and fundamental questions of clinical trials.


Empowering Decision Making with Historical Data: From Theory to Practice of Prior Elicitation and Bayesian Clinical Trial Design

Instructor: Ming-Hui Chen (University of Connecticut), Chenguang Wang (Regeneron) and Min Lin (Ohio State University)

Dr. Ming-Hui Chen is a Board of Trustees Distinguished Professor and Head of Department of Statistics at University of Connecticut (UConn). Dr. Chen's areas of research focus include Bayesian statistics, categorical data, design of clinical trials, MA and NMA, missing data, prostate cancer data, and survival data. He is an Elected Fellow of AAAS, ASA, IMS, and ISBA. He has published over 500 research papers. Currently, he is Co Editor-in-Chief of Statistics and Its Interface, and inaugurated Co Editor-in-Chief of New England Journal of Statistics in Data Science.

Dr. Chenguang Wang is the Head of Quantitative Innovation and Statistical Strategy (QISS) at Regeneron. Previously, he served as an Associate Professor at Johns Hopkins University and worked as a Mathematical Statistician at the Center for Devices and Radiological Health (CDRH), FDA. Dr. Wang has extensive experience in clinical trial design and analysis, particularly in regulatory settings. He is also an elected Fellow of the American Statistical Association.

Dr. Min Lin is a Postdoctoral Scholar at The Ohio State University and received his Ph.D. in Statistics from the University of Connecticut, with dissertation work on Bayesian methods for borrowing external and historical data in clinical trials. His research focuses on Bayesian trial design, quantifying prior information via effective sample size, and practical, transparent Bayesian analysis. He also works on survival analysis for dependent event-time data arising from contact-network dynamics.

Course Outline

  • Section 1: Leveraging Historical and External Data
    • Section 1.1: Introduction
      • RCT basics
      • Causal inference frameworks
      • Review of historical and external control (contexts of use)
      • Propensity score adjustment
      • Static and dynamic borrowing
    • Section 1.2: Regulatory Guidance Review
      • EMA External Control Concept Paper
      • FDA External Control Guidance
      • FDA CDRH Bayesian Guidance
      • FDA CDER and CBER Bayesian Guidance
      • FDA Adaptive Design Guidance
      • ICH E20 Adaptive Design
      • FDA Pediatric Drug Development Guidance
      • Cross-guidance themes
    • Section 1.3: Cases Review
      • Cases in the most recent FDA Bayesian guidance
      • Cases from the FDA CID Program
      • Cases from recent FDA approvals
      • Cases from other sources

  • Section 2: Bayesian SSD

    • Section 2.1: Introduction
      • Non-inferiority and Superiority Trials
      • Literature Review
      • Formulations of ACC, ALC, and WOC
      • Challenges in Bayesian SSD
    • Section 2.2: Bayesian Design of Clinical Trials
      • Design of a Non-Inferiority Trial
      • Design of a Superiority Trial
      • Basic Elements of Bayesian SSD
      • Bayesian Type I Error and Power
      • The Posterior Probability (PP) Approach
      • The Bayes Factor Approach
      • Conditional Borrowing Approach
      • Bayesian Power Calculation Algorithm
    • Section 2.3: Application to Non-inferiority Medical Device Trials
      • The Historical Medical Device Data
      • Frequentist SSD
      • Power Prior and Posterior
      • Normalized Power Priors
      • Hierarchical Priors
      • Sampling Prior
      • Design Setting
      • Computation Setting
      • Bayesian Type I error and Power
      • Sensitivity Analysis
      • Methods for Controlling Type I errors
      • Discussion
    • Section 2.4: Application to Superiority DMD Trials
      • Design Settings
      • Bayesian Power Calculation Algorithm
      • Duchenne Muscular Dystroph (DMD) Historical Data
      • Borrowing-by-Parts Power Prior (BP)
      • Hierarchical Prior (HP)
      • Robust Mixture Priors (RMP)
      • General Framework of Conditional Borrowing
      • A Simulation Study
      • Sweet Spot
      • Concluding Remarks

  • Section 3: Bayesian ESS for Informative Priors

    • Section 3.1: Introduction
      • Background and Motivation
      • How ESS will be used at design stage vs analysis stage
    • Section 3.2: Design-stage ESS
      • Optimization-based Quantification
      • Curvature-based Quantification
      • Information-ratio Quantification
    • Section 3.3: Analysis-stage ESS
      • Desired Properties (Axioms)
      • Definition and Theory
      • Normal Regression Case Study: Closed-form ESS
      • Computation
    • Section 3.4: Case Studies
      • REBYOTA Study
      • Kociba-NTP Study
    • Section 3.5: Interpretation and Reporting
      • Practical Interpretation of ESS Values
      • Reporting Design-stage vs Analysis-stage ESS in Applications

  • Section 4: Available Software on Bayesian SSD and Analysis

    • Section 4.1: Overview of available tools
    • Section 4.2: Demonstration of selected R packages

Section 1 will be instructed by Dr. Wang, Section 2 will be taught by Dr. Chen, and Dr. Lin will deliver Sections 3 and 4.

Abstract: Making informed decisions often relies heavily on data, but collecting new data can be expensive, time-consuming, and sometimes impractical. Historical or previously collected or concurrent real-world data provide valuable insights that can significantly empower decision-making processes across various fields such as healthcare, engineering, economics, and business.

Effective Elicitation of informative priors with historical data plays a key role to determine how much historical data influences current decisions. In the first part of this short course, we provide an overview of regulatory agencies’ standpoints and recent development of informative priors, illustrate their practical utility through case studies, and discuss strategies for improving decision-making.

Bayesian sample size determination (SSD) has a long history. The early work on Bayesian SSD can be traced back to 1990’s. Recently, several new methods on Bayesian designs of clinical trials has been developed with a focus on controlling type I error and power. In the second part of this short course, an overview of the literature on Bayesian SSD will be provided. The general theory and various methods of Bayesian SSD will be presented. The short course will also highlight several important applications in designing clinical trials to demonstrate the superiority of Bayesian SSD.

Quantifying the amount of information borrowed from historical data is essential for Bayesian trial design and final decision making. Effective sample size (ESS) provides a common scale for how much an informative prior contributes to design and inference. While ESS is often treated as a design-stage summary, there is growing interest in measures that can be updated after observing data, especially when the data may disagree with the prior. In the third part of this short course, we review existing design-stage ESS approaches and introduce a new analysis-stage ESS framework for the informative prior: in the absence of prior–data disagreement, the proposed ESS reduces to the design-stage ESS, while under disagreement it adapts to the observed data. We also discuss computational algorithms and present case studies.

The fourth part of this short course provides a brief overview of available software on Bayesian SSD and analysis currently available. Demonstrations of selected R packages will also be given.

Course Learning Objectives

The intended audience for this course include statisticians, biostatisticians and data scientists who hold at least a masters-level degree in biostatistics or a related field. The primary learning objectives for this course are to (1) provide practioners with a sound understanding of core, cross-cutting concepts for leveraging historical data, (2) help practioners understand the benefits and challenges of applying Bayesian methods in clinical trial design and analysis using realistic case studies, and (3) teach practitioners about R software that can be used to implement and evaluate Bayesian design and analysis in practice. By providing applied practioners with a sound understanding of core concepts related to Bayesian analysis, SSD, and ESS, they will better equipped to have discussions with internal and external colleagues regarding the most effective approaches to carry out Bayesian inference to empower decision making.


Modern Statistical Inference Tools for Complex Data Using Artificial Data

Instructor: Mingge Xie (Rutgers University) and Junyi Li (Rutgers University)

Minge Xie is a Distinguished Professor of Statistics at Rutgers, The State University of New Jersey. Dr. Xie is currently the Editor of The American Statistician and a co-founding Editor-in-Chief of the New England Journal of Statistics in Data Science. He is a Fellow of the ASA, a Fellow of the IMS, an elected member of the ISI, and a Fulbright Scholar. His research interests include the theoretical foundations of statistical inference and uncertainty quantification in machine learning, fusion learning, finite- and large-sample theory, and parametric and nonparametric methods. He is the Director of the Rutgers Office of Statistical Consulting and has extensive interdisciplinary research experience, collaborating with biomedical researchers, computer scientists, engineers, and scientists in other fields.

Junyi Li is a third-year PhD student in the Department of Statistics at Rutgers, The State University of New Jersey. His doctoral research focuses on developing new repro samples methods and applying state-of-the-art simulation-based inference (SBI) approaches to address complex inference problems in modern statistics and machine learning models. His research interests include statistical inference, simulation-based and likelihood-free methods, uncertainty quantification, and applications in machine learning and data science. He is also currently a Rutgers student consultant in AbbVie Inc.’s Apprentice Program, where he works closely with scientists at AbbVie’s Data & Statistical Sciences (DSS) group to tackle real-world problems arising in pharmaceutical and clinical research.

Abstract:

In today’s data-driven industries -- including pharmaceuticals, biotechnology, and technology—statistical challenges increasingly involve high-dimensional, complex, and non-numerical datasets. This one-day course provides an applied introduction to state-of-the-art inferential techniques that leverage simulated artificial data to deliver robust uncertainty quantification, reliable inference, and principled predictions, even in challenging settings.

Participants will explore practical frameworks for likelihood-free and simulation-based inference, including the repro samples method and other modern simulation-based approaches. These techniques offer strong frequentist performance guarantees while maintaining close connections to approximate Bayesian computation and recent amortized Bayesian methods that combine simulated data with machine learning tools for computationally efficient inference.

The course emphasizes inference for discrete or non-numerical model parameters, high-dimensional regression, and predictive inference using distribution-free methods. Robust repro samples and related simulation-based techniques will be presented as flexible tools for addressing complex real-world problems.

Through illustrative examples, hands-on exercises, and case studies drawn from industry and biomedical research, participants will learn how to draw scientifically sound conclusions, generate reliable predictions, and rigorously quantify uncertainty. Special attention will be given to robustness, computational efficiency, and broad applicability across modern data science applications.

This course is ideal for industry statisticians, data scientists, graduate students, and early-career researchers seeking to integrate cutting-edge inferential tools into their everyday work while bridging modern statistical theory with practical applications.

Course materials are organized into four modules:

  1. Background review on inference using artificial data and inversion techniques: bootstrap, approximate Bayesian computation (ABC), conformal prediction, and other simulation-based or simulation-inspired inference methods.

  2. Repro samples and Fisher inversion techniques: general methodology and connections to Neyman–Pearson test inversion; methods for making inference with discrete or non-numerical parameters and non-standard data types.

  3. Simulation-based inference (SBI): frequentist likelihood-free SBI, recent developments in robust SBI under model misspecification, and their connections to ABC and amortized Bayesian inference approaches.

  4. Illustrations: case studies and R/Python implementations (a) Models: Gaussian mixtures, high-dimensional regression, tree models, interpretable high-dimensional model-free classification, principled random forests, and change-point detection. (b) Applications: real datasets from clinical trials, gene discovery, and image analysis.


How Large Language Models Process and Reason: A Geometric View of Hidden-State Dynamics

Instructor: Aijun Zhang (Wells Fargo) and Agus Sudjianto (H2O.ai)

Abstract: Large language models like ChatGPT can generate fluent text and perform complex reasoning, but what internal mechanisms enable this behavior? In this course, we present a geometric framework for understanding transformer models by analyzing their hidden states as a structured field over layers and tokens. Rather than focusing on architectural details, we study how representations evolve using tools from linear algebra and statistics.

We introduce a metric-aware representation based on PCA whitening, which provides a consistent geometric space for analysis. Within this space, we define layer-wise and token-wise kernel structures, along with directed interaction operators that describe how representations change across layers and positions. These operators admit a decomposition into symmetric (scaling) and antisymmetric (rotation) components, separating changes in magnitude from changes in direction. Building on this, we introduce curvature, defined via transport around local layer–token neighborhoods, as a quantitative measure of interaction between layer and token dynamics.

Empirically, we show that transformer computation follows a consistent three-phase structure: initial rescaling at the embedding layer, gradual accumulation through intermediate layers, and concentrated nonlinear transformation in the final layers. Curvature-based metrics localize where reasoning occurs, distinguish retrieval from multi-step reasoning, and provide signals for task difficulty and failure modes. By the end of the course, participants will gain both an intuitive and method-level understanding of how large language models process information and perform reasoning.