Plenary Talks

Datatrail-Biostatisticians Building Inclusive Data Science Communities

Speaker: Dr. Jeff Leek, Johns Hopkins University

Jeff_Leek Jeff Leek is a professor of Biostatistics and Oncology at the Johns Hopkins Bloomberg School of Public Health and co-director of the Johns Hopkins Data Science Lab. His group develops statistical methods, software, data resources, and data analyses that help people make sense of massive-scale genomic and biomedical data. As the co-director of the Johns Hopkins Data Science Lab he has helped to develop massive online open programs that have enrolled more than 8 million individuals and partnered with community-based non-profits to use data science education for economic and public health development. He is a Fellow of the American Statistical Association and a recipient of the Mortimer Spiegelman Award and Committee of Presidents of Statistical Societies Presidential Award.


The data science revolution has led to massive new opportunities in technology, medicine, and business for people with data skills. Most people who have been able to take advantage of this revolution are already well educated, white-collar workers. In this talk I will describe our effort to expand access to data science jobs to individuals from under-served populations in East Baltimore. I will show how we are combining cloud based data science technologies, high-throughput educational data, and deep, low-throughput collaboration with local non-profits to create a pathway to data science success we call DataTrail. I will also discuss how you can create a DataTrail program in your community. DataTrail illustrates how statisticians have a unique opportunity in this data moment to lead change in the world.

Inference for Longitudinal Data After Adaptive Sampling

Speaker: Dr. Susan A. Murphy, Harvard University

Susan A. Murphy Susan Murphy’s research focuses on improving sequential, individualized, decision making in health, in particular, clinical trial design and data analysis to inform the development of just-in-time adaptive interventions in digital health. She developed the micro-randomized trial for use in constructing digital health interventions; this trial design is in use across a broad range of health- related areas. Her lab works on online learning algorithms for developing personalized digital health interventions. Dr. Murphy is a member of the National Academy of Sciences and of the National Academy of Medicine, both of the US National Academies. In 2013 she was awarded a MacArthur Fellowship for her work on experimental designs to inform sequential decision making. She is a Past-President of IMS and of the Bernoulli Society and a former editor of the Annals of Statistics.


Adaptive sampling methods, such as reinforcement learning (RL) and bandit algorithms, are increasingly used for the real-time personalization of interventions in digital applications like mobile health and education. As a result, there is a need to be able to use the resulting adaptively collected user data to address a variety of inferential questions, including questions about time-varying causal effects. However, current methods for statistical inference on such data (a) make strong assumptions regarding the environment dynamics, e.g., assume the longitudinal data follows a Markovian process, or (b) require data to be collected with one adaptive sampling algorithm per user, which excludes algorithms that learn to select actions using data collected from multiple users. These are major obstacles preventing the use of adaptive sampling algorithms more widely in practice. In this work, we proved statistical inference for the common Z-estimator based on adaptively sampled data. The inference is valid even when observations are non-stationary and highly dependent over time, and (b) allow the online adaptive sampling algorithm to learn using the data of all users. Furthermore, our inference method is robust to miss-specification of the reward models used by the adaptive sampling algorithm. This work is motivated by our work in designing the Oralytics oral health clinical trial in which an RL adaptive sampling algorithm will be used to select treatments, yet valid statistical inference is essential for conducting primary data analyses after the trial is over.

Statisticians Rise to the Occasion to Meet Pandemic: Perspectives on COVID-19 Vaccine Development

Session organizer: Qiqi Deng, Moderna
Moderator: Joseph C. Cappelleri, Pfizer

Since Coronavirus Disease 2019 (COVID-19) brought the global pandemic to the world in March of 2020, Covid-19 vaccine research and development have been accelerated at an unprecedented pace. Pharmaceutical and biotech companies, together with government and academia, are fully committed to deliver efficacious and safe products to protect the public from COVID-19. Between December 2020 and February 2021, FDA has issued Emergency Use Authorization (EUA) for three vaccines from Pfizer-BioNTech, Moderna and Janssen. And over the past year, there has been additional positive data on booster, adolescent, and pediatric use. To thoroughly understand how vaccine works, and to improve the efficiency of vaccine development, statisticians have faced many challenges on the design and analysis of vaccine studies. This keynote panel session will bring four panelists that are statistical leaders on Covid-19 vaccine development from major vaccine manufacturers, FDA, and academic institutions. They will share their experiences from their distinctive perspectives and provide their latest insights from a broad range of data sources.

Update on Learnings about Immune Correlates of Protection from COVID-19 Vaccine Efficacy Trials
Panelist: Dr. Peter Gilbert, Fred Hutchinson Cancer Research Center

Peter Gilbert Dr. Peter Gilbert, Professor of Biostatistics at the Fred Hutchinson Cancer Research Center and at the University of Washington, focuses on the statistical design and analysis of randomized clinical trials of candidate vaccines for HIV, SARS-CoV-2, malaria, and other infectious pathogens. He specializes in statistical methods and data analyses of these trials to understand how immune responses to vaccination and genetic features of infectious pathogens impact the protective level of the vaccine, so-called “immune correlates of protection analyses” and “sieve analyses.” Dr. Gilbert is Principal Investigator of the Statistical Data Management Center for the National Institute of Allergy and Infectious Diseases (NIAID)-sponsored HIV Vaccine Trials Network and has co-led statistical science research for the US government-sponsored COVID-19 vaccine clinical research program.


Randomized, placebo-controlled phase 3 COVID-19 vaccine efficacy trials assess how well candidate vaccines prevent infection and disease caused by the SARS-CoV-2 virus. The NIH-supported COVID-19 Prevention Network (CoVPN) continues to co-conduct with vaccine manufacturers five such phase 3 trials, which include the objective to assess antibody markers as various types of “immune correlates of protection (CoPs).” CoPs can be formally defined using several statistical frameworks, including risk prediction, principal stratification vaccine effect modification, and multiple causal mediation app roaches. Based on analyses of the phase 3 trials using these statistical frameworks, this talk will provide an update on recent learnings about antibody markers as CoPs for vaccines against COVID-19.

Durability of Covid-19 Vaccines
Panelist: Dr. Danyu Lin, University of North Carolina at Chapel Hill

Danyu Lin Danyu Lin, Ph.D., is the Dennis Gillings Distinguished Professor of Biostatistics at the University of North Carolina at Chapel Hill. Dr. Lin is an internationally recognized leader in biostatistics and currently serves as an Associate Editor for Biometrika and JASA. He has published over 200 peer-reviewed papers, most of which appeared in top statistical journals. Dr. Lin received the Mortimer Spiegelman Gold Medal from the American Public Health Association in 1999 and the George W. Snedecor Award from the Committee of Presidents of Statistical Societies in 2015. Other honors include ASA and IMS Fellows, Thomson ISI's list of Highly Cited Researchers in Mathematics, JASA and JRSS(B) discussion papers, and NIH Merit Award.


Evaluating the durability of protection afforded by Covid-19 vaccines is a public health priority, with the results needed to inform policies around booster vaccinations as well as those around non-pharmaceutical interventions. In this talk, I will present a general framework for estimating the effects of Covid-19 vaccines over time in phase 3 clinical trials and observational studies. I will show some results on the duration of vaccine protection from the Moderna pivotal trial and from the North Carolina statewide surveillance data. The latter data, which were published in the New England Journal of Medicine in January, provided rich information about the effectiveness of the Pfizer, Moderna, and Johnson & Johnson vaccines in reducing the risks of Covid-19, hospitalization, and death over time. I will discuss the implications of these results for booster vaccinations.

Evolution of the COVID-19 Vaccine During Pandemic: Transforming Development Paradigm
Panelist: Dr. Satrajit Roychoudhury, Pfizer

Satrajit Roychoudhury Dr. Satrajit Roychoudhury is a Senior Director and a member of Statistical Research and Innovation group in Pfizer Inc. Prior to joining, he was a member of Statistical Methodology and consulting group in Novartis. He started his career as a research statistician in Schering Plough Research Institute (now Merck Co.). He has 15 years of extensive experience in working with different phases of clinical trials for drug and vaccine. His area of research includes survival analysis, use of model-based approaches and Bayesian methods in clinical trials. He has co-authored several publications in peer reviewed journals and book chapters. In addition, he has provided training and workshop in major statistical conferences. Satrajit is a member of American Statistical Association (ASA) and Drug Information association (DIA). He served as the industry co-chair for ASA Biopharmaceutical Section Regulatory-Industry Workshop in 2018 and a member of current DIA Regulatory-Industry Statistics forum scientific committee. Satrajit was a recipient of a Young Statistical Scientist Award from the International Indian Statistical Association in 2019.


Vaccines are complex biological products which are administered to healthy individuals. Safety is therefore paramount; vaccine development often entails large, time-consuming, and resource-intensive studies to detect rare safety issues and to establish vaccine efficacy. Before a vaccine is licensed and brought to the market, it undergoes a long and rigorous process of research, followed by many years of clinical testing. However, such framework requires modification for COVID-19 vaccine development due to high public health demand. This talk will present the operationally seamless development paradigm used to develop a mRNA vaccine for COVID-19. The design contains two parts. Phase 1 part was escalating dose levels in small cohorts in two age groups to identify a preferred candidate and dose level. It is followed by a phase 2/3 to evaluate safety, immunogenicity, and vaccine efficacy. A Bayesian group sequential designs was used for phase 2/3 part. This trial incorporates multiple interim analyses to assess early efficacy and futility of the vaccine. The Bayesian framework enabled us to obtain efficient designs using decision criteria based on the probability of benefit or harm. It also enabled us to incorporate information from previous studies on the treatment effect via the prior distributions. For COVID-19 vaccine trial, vaccine efficacy was based on achieving a sufficiently high Bayesian posterior probability. In addition, this trial has incorporated early stopping for futility based on Bayesian predictive probabilities. The talk will include key statistical aspects and regulatory challenges of the design. Publicly disclosed results will be summarized and discussed.

Associations of Immunogenicity and Reactogenicity After SARS-CoV-2 mRNA-1273 Vaccine
Panelist: Dr. Honghong Zhou, Moderna

Honghong Zhou Dr. Honghong Zhou is a Sr. Director of Biostatistics at Moderna, Inc. She has been the Biostats lead for Moderna’s COVID-19 vaccine or SPIKEVAX, also known as mRNA-1273. Dr. Zhou has extensive experience supporting late stage clinical trials in many therapeutic areas including vaccine, rare disease, infectious disease and oncology. She was the lead biostatistician in several successful regulatory submissions including FDA’s first tissue/site-agnostic approval on Keytruda for MSI-H (high level of microsatellite unstable tumors) or dMMR (defective mismatch repair) solid tumors, and Keytruda for 1st line Melanoma.


MRNA-1273, a messenger RNA (mRNA) vaccine for Covid-19 has demonstrated high efficacy in preventing illness including severe illness and death. The vaccine induced robust immune responses that were generally consistent among children, adolescents and both younger and older adults, consistent with its efficacy benefit. The frequency and severity of reactogenic events are an important aspect of safety assessment information related to physical phenomena associated with inflammatory responses to vaccination.

In this talk, we assess the associations between immunogenicity and reactogenicity after two injections of 100 µg mRNA-1273 in the phase 3 COVE adults (≥18 years) and phase 2/3 TeenCOVE adolescents (12-17 years).

Data Minding Before Data Mining

Speaker: Dr. Xiao-Li Meng, Harvard University

Xiao-Li Meng Xiao-Li Meng, the Founding Editor-in-Chief of HDSR and the Whipple V. N. Jones Professor of Statistics, and the Founding Editor-in-Chief of Harvard Data Science Review, is well known for his depth and breadth in research, his innovation and passion in pedagogy, his vision and effectiveness in administration, as well as for his engaging and entertaining style as a speaker and writer. Meng was named the best statistician under the age of 40 by COPSS (Committee of Presidents of Statistical Societies) in 2001, and he is the recipient of numerous awards and honors for his more than 150 publications in at least a dozen theoretical and methodological areas, as well as in areas of pedagogy and professional development. In 2020, he was elected to the American Academy of Arts and Sciences. He has delivered more than 400 research presentations and public speeches on these topics, and he is the author of “The XL-Files," a thought-provoking and entertaining column in the IMS (Institute of Mathematical Statistics) Bulletin. His interests range from the theoretical foundations of statistical inferences (e.g., the interplay among Bayesian, Fiducial, and frequentist perspectives; frameworks for multi-source, multi-phase and multi- resolution inferences) to statistical methods and computation (e.g., posterior predictive p-value; EM algorithm; Markov chain Monte Carlo; bridge and path sampling) to applications in natural, social, and medical sciences and engineering (e.g., complex statistical modeling in astronomy and astrophysics, assessing disparity in mental health services, and quantifying statistical information in genetic studies). Meng received his BS in mathematics from Fudan University in 1982 and his PhD in statistics from Harvard in 1990. He was on the faculty of the University of Chicago from 1991 to 2001 before returning to Harvard, where he served as the Chair of the Department of Statistics (2004-2012) and the Dean of Graduate School of Arts and Sciences (2012-2017).