Keynote Presentation
Speakers:
Dr. John Lafferty, Yale University
Dr. Heping Zhang, Yale University
Dr. Nancy Zhang, University of
Pennsylvania
Dr. Michael Lopez,
NFL
Keynote Panel Discussion: Women in Statistics:
Celebrating Advancements and Sharing Advice
Invited by NESS President Dr. Rebecca Betensky and Chaired
by Dr. Rachel Nethery
Panelists: Dr.
Nalini Ravishanker, Professor of Statistics, University of
Connecticut
Dr. Melissa Naylor, Head of
Global Portfolio Statistics, Takeda
Dr. Ani Eloyan, Associate Professor of
Biostatistics, Brown University
Dr.
Cindy Lu, Senior Director Biostatistics, AstraZeneca
Keynote Presentation:
CoT Information
Speaker: Dr. John Lafferty, Yale
University
John Lafferty is John C. Malone
Professor in the Department of Statistics and Data Science at
Yale University, with a secondary appointment in Computer
Science. Lafferty is an Associate Director of the Wu Tsai
Institute at Yale, a University-wide institute focused on the
mission of understanding human cognition. He is Director of
the Center for Neurocomputation and Machine Intelligence
within the WTI. Lafferty’s most recent research lies at the
interface of AI, machine learning, and neuroscience.
Abstract: Chain-of-thought supervision has emerged as a powerful empirical technique in AI, underpinning much of the recent progress in the reasoning capabilities of large language models. We present a statistical theory of learning under CoT supervision. A formal framework for representing CoT hypothesis classes is introduced, based on learning objectives focused on achieving small end-to-end error using CoT supervision during training. Central to the theory is the CoT information measure, which quantifies the additional discriminative power gained from observing the "thought process" for distinguishing hypotheses with different end-to-end behaviors. The main theoretical results, with both upper and lower bounds, demonstrate how CoT supervision can yield significantly faster learning rates compared to classical supervision. Joint work with Awni Altabaa and Omar Montasser.
Keynote Presentation:
Empowering Statistics in Biomedical Research – A
Personal Perspective
Speaker: Dr. Heping Zhang, Yale
University
Heping Zhang, Ph.D. is
Susan Dwight Bliss Professor of Biostatistics, Professor in
the Child Study Center, Professor of Statistics and Data
Science, and Professor of Obstetrics, Gynecology, and
Reproductive Sciences, Yale University. He directs the
Collaborative Center for Statistics in Science that
coordinates clinical trials to evaluate treatment
effectiveness for infertility. He was named the 2008 Myrto
Lefokopoulou distinguished lecturer by Harvard School of
Public Health and a Medallion Award and Lecturer by the
Institute of Mathematical Statistics. He is a former-editor of
the Journal of the American Statistical Association -
Applications and Case Studies. He was the recipient of the
2022 Neyman Award and Lecture by the Institute of Mathematical
Statistics and the 2023 Distinguished Achievement Award by the
International Chinese Statistical Association. He was selected
as a 2023 Highly Cited Researcher in cross field by Web of
Science. His research interests are to develop and apply
statistical methods in biomedical research including
epidemiology, genetics, mental health, cognition, and
reproductive medicine.
Abstract: The advent of technology has led to an explosion of large-scale and diverse data, including text, images, and omics data. Addressing the challenges arising from analyzing such data has garnered significant attention and generated both excitement and uncertainty within the statistical community, especially regarding our role in the evolving landscape of data science. In this talk, I will begin by highlighting historical dilemmas that might have hindered broader recognition of statistical contributions, then share personal experiences to illustrate ongoing challenges we face. I will conclude by suggesting strategies to bridge the gap between methodological innovation and real-world application, and how doing so can strengthen the visibility and impact of statistical science. I caution that this talk reflects my personal opinion only.
Keynote Presentation:
Data Integration in Spatial and Single Cell Omics:
What is Erased, and Can you Recover it?
Speaker: Dr. Nancy Zhang, University of
Pennsylvania
Dr. Zhang is a Ge Li and
Ning Zhao Professor of Statistics in The Wharton School at
University of Pennsylvania. Her research focuses primarily on
the development of statistical methods and computational
algorithms for the analysis of data from high-throughput
biological experiments. She has made contributions to copy
number and structural variant detection, to the modeling and
estimation of intra-tumor genetic heterogeneity, and to the
modeling and analysis of single-cell and spatial genomic
data. In Statistics, she has made contributions to
change-point analysis, variable selection, and model
selection. Dr. Zhang obtained her Ph.D. in Statistics in 2005
from Stanford University. After one year of postdoctoral
training at University of California, Berkeley, she returned
to the Department of Statistics at Stanford University as
Assistant Professor in 2006. She received the Sloan
Fellowship in 2011, and formally moved to University of
Pennsylvania with tenure in 2012. She was awarded the
Medallion Lectureship by the Institute of Mathematical
Statistics in 2021 and the P.R. Krishnaiah Memorial
Lectureship in 2023. Her work has been funded by grants from
the NSF, NIH, and Mark Foundation. At Penn, she is a member
of the Abramson Cancer Center and the Graduate Group in
Genomics and Computational Biology, and Senior Fellow of
Institute of Biomedical Informatics. Dr. Zhang currently
serves as the Vice Dean of the Wharton Doctoral Program.
Abstract: In single-cell and spatial biology, data integration refers to the alignment of cells across samples and modalities, and is an ubiquitous challenge affecting all downstream analyses. The goal in cell integration is to find cells across data sets that share the same biological state that may be obscured by technical differences. In this talk, I will cast the cell integration problem on a continuum of weak to strong linkage, depending on the strength of feature sharing between experiments. First, I will examine integration across data modalities of weak linkage. This arises when there are few shared features between the data being integrated, for example, between single-cell RNA sequencing data and spatial proteomics data. For this, I will present MaxFuse, a method that leverages higher order relationships between all features, including unshared features, to achieve accurate integration. Next, we consider the scenario of data alignment across the same modality in clinical scale studies. For this setting, I will show that existing paradigms are overly aggressive, erasing disease and treatment effects and introducing severe data distortion. I will introduce a "pool-of-controls" experimental design concept to disentangle biological variation from unwanted variation. Based on this, I will describe CellANOVA, a novel statistical model and scalable algorithm that recovers biological signals lost during batch integration and corrects integration related data distortion. Through these two contrasting paradigms, I will share the key lessons learned and the remaining challenges in this field.
Speaker: Dr. Michael Lopez,
NFL
Michael Lopez is a
Senior Director of Football Data and Analytics at the National
Football League. At the National Football League, his work
centers on how to use data to enhance and better understand
the game of football. He is an Associate Editor at the Journal
of Quantitative Analysis in Sports, and has written for
FiveThirtyEight, Deadspin, Sports Illustrated, and Hockey
News. From 2014 through 2021, he worked at Skidmore College,
first as an Assistant Professor and then as a Lecturer and
Research Associate. In 2020, he was named the American
Statistical Association’s Statistics in Sports Significant
Contributor Award.