Plenary Talks


Keynote Presentation: CoT Information
Speaker: Dr. John Lafferty, Yale University
John Lafferty John Lafferty is John C. Malone Professor in the Department of Statistics and Data Science at Yale University, with a secondary appointment in Computer Science. Lafferty is an Associate Director of the Wu Tsai Institute at Yale, a University-wide institute focused on the mission of understanding human cognition. He is Director of the Center for Neurocomputation and Machine Intelligence within the WTI. Lafferty’s most recent research lies at the interface of AI, machine learning, and neuroscience.

Abstract: Chain-of-thought supervision has emerged as a powerful empirical technique in AI, underpinning much of the recent progress in the reasoning capabilities of large language models. We present a statistical theory of learning under CoT supervision. A formal framework for representing CoT hypothesis classes is introduced, based on learning objectives focused on achieving small end-to-end error using CoT supervision during training. Central to the theory is the CoT information measure, which quantifies the additional discriminative power gained from observing the "thought process" for distinguishing hypotheses with different end-to-end behaviors. The main theoretical results, with both upper and lower bounds, demonstrate how CoT supervision can yield significantly faster learning rates compared to classical supervision. Joint work with Awni Altabaa and Omar Montasser.

Keynote Presentation: Empowering Statistics in Biomedical Research – A Personal Perspective
Speaker: Dr. Heping Zhang, Yale University
Heping Zhang Heping Zhang, Ph.D. is Susan Dwight Bliss Professor of Biostatistics, Professor in the Child Study Center, Professor of Statistics and Data Science, and Professor of Obstetrics, Gynecology, and Reproductive Sciences, Yale University. He directs the Collaborative Center for Statistics in Science that coordinates clinical trials to evaluate treatment effectiveness for infertility. He was named the 2008 Myrto Lefokopoulou distinguished lecturer by Harvard School of Public Health and a Medallion Award and Lecturer by the Institute of Mathematical Statistics. He is a former-editor of the Journal of the American Statistical Association - Applications and Case Studies. He was the recipient of the 2022 Neyman Award and Lecture by the Institute of Mathematical Statistics and the 2023 Distinguished Achievement Award by the International Chinese Statistical Association. He was selected as a 2023 Highly Cited Researcher in cross field by Web of Science. His research interests are to develop and apply statistical methods in biomedical research including epidemiology, genetics, mental health, cognition, and reproductive medicine.

Abstract: The advent of technology has led to an explosion of large-scale and diverse data, including text, images, and omics data. Addressing the challenges arising from analyzing such data has garnered significant attention and generated both excitement and uncertainty within the statistical community, especially regarding our role in the evolving landscape of data science. In this talk, I will begin by highlighting historical dilemmas that might have hindered broader recognition of statistical contributions, then share personal experiences to illustrate ongoing challenges we face. I will conclude by suggesting strategies to bridge the gap between methodological innovation and real-world application, and how doing so can strengthen the visibility and impact of statistical science. I caution that this talk reflects my personal opinion only.

Keynote Presentation: Data Integration in Spatial and Single Cell Omics: What is Erased, and Can you Recover it?
Speaker: Dr. Nancy Zhang, University of Pennsylvania
Nancy Zhang Dr. Zhang is a Ge Li and Ning Zhao Professor of Statistics in The Wharton School at University of Pennsylvania.  Her research focuses primarily on the development of statistical methods and computational algorithms for the analysis of data from high-throughput biological experiments.  She has made contributions to copy number and structural variant detection, to the modeling and estimation of intra-tumor genetic heterogeneity, and to the modeling and analysis of single-cell and spatial genomic data.  In Statistics, she has made contributions to change-point analysis, variable selection, and model selection. Dr. Zhang obtained her Ph.D. in Statistics in 2005 from Stanford University.  After one year of postdoctoral training at University of California, Berkeley, she returned to the Department of Statistics at Stanford University as Assistant Professor in 2006.  She received the Sloan Fellowship in 2011, and formally moved to University of Pennsylvania with tenure in 2012.  She was awarded the Medallion Lectureship by the Institute of Mathematical Statistics in 2021 and the P.R. Krishnaiah Memorial Lectureship in 2023.  Her work has been funded by grants from the NSF, NIH, and Mark Foundation.  At Penn, she is a member of the Abramson Cancer Center and the Graduate Group in Genomics and Computational Biology, and Senior Fellow of Institute of Biomedical Informatics.  Dr. Zhang currently serves as the Vice Dean of the Wharton Doctoral Program.

Abstract: In single-cell and spatial biology, data integration refers to the alignment of cells across samples and modalities, and is an ubiquitous challenge affecting all downstream analyses. The goal in cell integration is to find cells across data sets that share the same biological state that may be obscured by technical differences. In this talk, I will cast the cell integration problem on a continuum of weak to strong linkage, depending on the strength of feature sharing between experiments. First, I will examine integration across data modalities of weak linkage. This arises when there are few shared features between the data being integrated, for example, between single-cell RNA sequencing data and spatial proteomics data. For this, I will present MaxFuse, a method that leverages higher order relationships between all features, including unshared features, to achieve accurate integration. Next, we consider the scenario of data alignment across the same modality in clinical scale studies. For this setting, I will show that existing paradigms are overly aggressive, erasing disease and treatment effects and introducing severe data distortion. I will introduce a "pool-of-controls" experimental design concept to disentangle biological variation from unwanted variation. Based on this, I will describe CellANOVA, a novel statistical model and scalable algorithm that recovers biological signals lost during batch integration and corrects integration related data distortion. Through these two contrasting paradigms, I will share the key lessons learned and the remaining challenges in this field.


Speaker: Dr. Michael Lopez, NFL
Michael Lopez Michael Lopez is a Senior Director of Football Data and Analytics at the National Football League. At the National Football League, his work centers on how to use data to enhance and better understand the game of football. He is an Associate Editor at the Journal of Quantitative Analysis in Sports, and has written for FiveThirtyEight, Deadspin, Sports Illustrated, and Hockey News. From 2014 through 2021, he worked at Skidmore College, first as an Assistant Professor and then as a Lecturer and Research Associate. In 2020, he was named the American Statistical Association’s Statistics in Sports Significant Contributor Award.