- 34
- 13 634
ASA Statistical Learning and Data Science
United States
เข้าร่วมเมื่อ 28 ต.ค. 2021
Weijie Su: How Statistics Can Advance Large Language Models: Fairness Alignment and Watermarking
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS)
December webinar: How Statistics Can Advance Large Language Models: Fairness Alignment and Watermarking
Record: December 3, 2024
Presenter: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer Information Science and Mathematics at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning Center. Prior to joining Penn, he received his Ph.D. in statistics from Stanford University in 2016 and bachelor's degree in mathematics from Peking University in 2011. His research interests span the statistical foundations of AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Foundations and Trends in Statistics, and Operations Research. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, and the ICBS Frontiers of Science Award in Mathematics.
Abstract: Large language models (LLMs) have rapidly emerged as a transformative innovation in machine learning. However, their increasing influence on human decision-making processes raises critical societal questions. In this talk, we will demonstrate how statistics can help address two key challenges: ensuring fairness for minority groups through alignment and combating misinformation through watermarking. First, we tackle the challenge of creating fair LLMs that equitably represent and serve diverse populations. We derive a regularization term that is both necessary and sufficient for aligning LLMs with human preferences, ensuring equitable outcomes across different demographics. Second, we introduce a general statistical framework to analyze the efficiency of watermarking schemes for LLMs. We develop optimal detection rules for an important watermarking scheme recently developed at OpenAI and empirically demonstrate its superiority over the existing detection method. Throughout the talk, we will showcase how statistical insights can not only address pressing challenges posed by LLMs but also unlock substantial opportunities for the field of statistics to drive responsible generative AI development. This talk is based on arXiv:2405.16455 and arXiv:2404.01245.
For more information about or to join ASA SLDS, visit
community.amstat.org/slds/home
www.amstat.org/
December webinar: How Statistics Can Advance Large Language Models: Fairness Alignment and Watermarking
Record: December 3, 2024
Presenter: Weijie Su is an Associate Professor in the Wharton Statistics and Data Science Department and, by courtesy, in the Departments of Computer Information Science and Mathematics at the University of Pennsylvania. He is a co-director of Penn Research in Machine Learning Center. Prior to joining Penn, he received his Ph.D. in statistics from Stanford University in 2016 and bachelor's degree in mathematics from Peking University in 2011. His research interests span the statistical foundations of AI, privacy-preserving machine learning, high-dimensional statistics, and optimization. He serves as an associate editor of the Journal of Machine Learning Research, Journal of the American Statistical Association, Foundations and Trends in Statistics, and Operations Research. His work has been recognized with several awards, such as the Stanford Anderson Dissertation Award, NSF CAREER Award, Sloan Research Fellowship, IMS Peter Hall Prize, SIAM Early Career Prize in Data Science, ASA Noether Early Career Award, and the ICBS Frontiers of Science Award in Mathematics.
Abstract: Large language models (LLMs) have rapidly emerged as a transformative innovation in machine learning. However, their increasing influence on human decision-making processes raises critical societal questions. In this talk, we will demonstrate how statistics can help address two key challenges: ensuring fairness for minority groups through alignment and combating misinformation through watermarking. First, we tackle the challenge of creating fair LLMs that equitably represent and serve diverse populations. We derive a regularization term that is both necessary and sufficient for aligning LLMs with human preferences, ensuring equitable outcomes across different demographics. Second, we introduce a general statistical framework to analyze the efficiency of watermarking schemes for LLMs. We develop optimal detection rules for an important watermarking scheme recently developed at OpenAI and empirically demonstrate its superiority over the existing detection method. Throughout the talk, we will showcase how statistical insights can not only address pressing challenges posed by LLMs but also unlock substantial opportunities for the field of statistics to drive responsible generative AI development. This talk is based on arXiv:2405.16455 and arXiv:2404.01245.
For more information about or to join ASA SLDS, visit
community.amstat.org/slds/home
www.amstat.org/
มุมมอง: 173
วีดีโอ
Nathaniel O’Connell: A Comparison of Methods of Cross-Validation for Small Data
มุมมอง 10521 วันที่ผ่านมา
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) November webinar: A Comparison of Methods of Cross-Validation for Small Data - Practical Guidance for Prediction Model Development with limited sample sizes Record: November 19, 2024 Presenter: Dr. Nathaniel (Nate) O’Connell is an assistant professor in the Department of Biostatistics and Data Scienc...
Runze Li: High-Dimensional Statistical Inference
มุมมอง 311หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) October webinar: High-Dimensional Statistical Inference Record: October 29, 2024 Presenter: Runze Li is the Eberly Family Chair Professor in Statistics, The Pennsylvania State University. He served as Co-Editor of Annals of Statistics from 2013 to 2015. Runze Li is a Fellow of IMS, ASA and AAAS. His ...
Jing Lei: Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection
มุมมอง 2612 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) September webinar: Winners with Confidence: Discrete Argmin Inference with an Application to Model Selection Record: September 24, 2024 Presenter: Jing Lei is Professor of Statistics & Data Science at Carnegie Mellon University. He received his Bachelor of Science degree from the School of Mathematic...
Emmanuel Candès: Statistical methods for assessing the factual accuracy of large language models
มุมมอง 1.8K3 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) August webinar: Statistical methods for assessing the factual accuracy of large language models Record: August 29, 2024 Presenter: Emmanuel Candès is the Barnum-Simons Chair in Mathematics and Statistics, professor of electrical engineering (by courtesy), and a member of the Institute of Computationa...
Bikram Karmakar: A new paradigm for causal inference in the presence of unmeasured confounders
มุมมอง 2094 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) July webinar: A new paradigm for causal inference in the presence of unmeasured confounders by calibrating a resistant population's variance Record: July 30, 2024 Presenter: Bikram Karmakar is an Assistant Professor in the Statistics Department at University of Florida. Prof Karmakar teaches advanced...
Bei Jiang: Online Local Differential Private Quantile Inference
มุมมอง 845 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) June webinar: Online Local Differential Private Quantile Inference Record: June 21, 2024 Presenter: Dr. Bei Jiang is an Associate Professor at the Department of Mathematical and Statistical Sciences of the University of Alberta, a Fellow and a Canada CIFAR AI chair affiliated with the Alberta Machine...
Andrej Risteski: The statistical cost of score-based losses
มุมมอง 6416 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) December webinar: Can Statistics Save Machine Learning from a Crisis? A Regression Approach to Peer Review in NeurIPS/ICML Record: May 30, 2024 Presenter: Andrej Risteski is an Assistant Professor at the Machine Learning Department in Carnegie Mellon University. Prior to that, he was a Norbert Wiener...
Bin Yu: Why Veridical Data Science? And How?
มุมมอง 2698 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) March webinar: Why Veridical Data Science? And How? Record: April 4, 2023 Presenter: Bin Yu is Chancellor's Distinguished Professor and Class of 1936 Second Chair in Statistics, EECS, and Computational Biology at UC Berkeley. Her research focuses on the practice and theory of statistical machine lear...
Mladen Kolar: Adaptive Stochastic Optimization with Constraints
มุมมอง 2049 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) February webinar: Adaptive Stochastic Optimization with Constraints Record: February 27, 2024 Presenter: Mladen Kolar is a professor in the Department of Data Sciences and Operations at the USC Marshall School of Business. Mladen earned his PhD in Machine Learning from Carnegie Mellon University in 2...
Andrew Gelman: Learning from mistakes
มุมมอง 1.5K10 หลายเดือนก่อน
Links mentioned in the talk: Election poll example: web.archive.org/web/20090326143823/www.fivethirtyeight.com/2009/03/how-did-white-people-vote.html Nudge example: statmodeling.stat.columbia.edu/2009/05/11/discussion_and/ statmodeling.stat.columbia.edu/2022/06/04/pizzagate-and-nudge-an-opportunity-lost/ This talk: statmodeling.stat.columbia.edu/2024/01/23/learning-from-mistakes-my-online-talk-...
Weijie Su: A Regression Approach to Peer Review in NeurIPS/ICML
มุมมอง 31611 หลายเดือนก่อน
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) December webinar: Can Statistics Save Machine Learning from a Crisis? A Regression Approach to Peer Review in NeurIPS/ICML Record: December 15, 2023 Presenter: Weijie Su is an Associate Professor at the University of Pennsylvania, with an appointment in the Wharton Statistics and Data Science Departm...
Glen Wright Colopy: The Pareto Principle in Data Science: Maximizing Value and Efficiency
มุมมอง 389ปีที่แล้ว
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) November webinar: The Pareto Principle in Data Science: Maximizing Value and Efficiency Record: November 30, 2023 Presenter: Glen Wright Colopy is the Head of Data Science & Statistics at Wildfell, a startup specializing in custom software and data science solutions for the biotech and life science i...
Mengye Ren: Lifelong Learning in Structured Environments
มุมมอง 255ปีที่แล้ว
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) October webinar: Lifelong Learning in Structured Environments Record: October 26, 2023 Presenter: Mengye Ren is an assistant professor of computer science and data science at New York University (NYU). Before joining NYU, he was a visiting faculty researcher at Google Brain Toronto working with Prof....
Krishna Balasubramanian: Optimization-based analysis of sampling algorithms
มุมมอง 208ปีที่แล้ว
American Statistical Association (ASA), Section on Statistical Learning and Data Science (SLDS) September webinar: Optimization-based analysis of sampling algorithms Record: September 27, 2023 Presenter: Krishna Balasubramanian is currently an Associate Professor in the Department of Statistics at the University of California, Davis, where he also holds affiliations with the Graduate Group in A...
Lucas Janson: Exact Conditional Independence Testing and Conformal Inference
มุมมอง 693ปีที่แล้ว
Lucas Janson: Exact Conditional Independence Testing and Conformal Inference
Hongtu Zhu: Statistical Learning Methods for Neuroimaging Data Analysis with Applications
มุมมอง 367ปีที่แล้ว
Hongtu Zhu: Statistical Learning Methods for Neuroimaging Data Analysis with Applications
Rina Foygel Barber: Stability of black-box algorithms
มุมมอง 548ปีที่แล้ว
Rina Foygel Barber: Stability of black-box algorithms
Qiqi Deng: A Brief introduction for drug development and how biostatistician can contribute
มุมมอง 217ปีที่แล้ว
Qiqi Deng: A Brief introduction for drug development and how biostatistician can contribute
Jason Klusowski: Pointwise Behavior of Recursive Partitioning
มุมมอง 229ปีที่แล้ว
Jason Klusowski: Pointwise Behavior of Recursive Partitioning
George Michailidis: Statistical models for mixed frequency data in forecasting economic indicators
มุมมอง 1.2Kปีที่แล้ว
George Michailidis: Statistical models for mixed frequency data in forecasting economic indicators
Hui Zou: Sparse Convoluted Rank Regression in High Dimensions
มุมมอง 381ปีที่แล้ว
Hui Zou: Sparse Convoluted Rank Regression in High Dimensions
Barbara Day: How to conduct a successful job search and negotiate your best offer
มุมมอง 104ปีที่แล้ว
Barbara Day: How to conduct a successful job search and negotiate your best offer
Ryan Tibshirani: Delphi's Epidata Project
มุมมอง 1982 ปีที่แล้ว
Ryan Tibshirani: Delphi's Epidata Project
Linglong Kong: Exploration and Optimization in Deep Reinforcement Learning
มุมมอง 1382 ปีที่แล้ว
Linglong Kong: Exploration and Optimization in Deep Reinforcement Learning
Zhenyu Zhao: From Experimentation to Causal Learning
มุมมอง 2542 ปีที่แล้ว
Zhenyu Zhao: From Experimentation to Causal Learning
Very inspirational!
👍
Thanks for sharing!
Mayday Mayday. We are on harsh democracy situation in Indonesia. We need a hand. Do not miss it out message. Thank you Bro. ❤
On the topic of being an asshole when giving criticism. It's like Gelman says, you're best positioned to try and see through the delivery to the content but on the other hand as someone giving feedback it's important to be clear so that the other party doesn't have to do that work. It means that it's on everybody to try and communicate clearly but also for us to acknowledge that the rough delivery comes not from bad people but from people who are feeling something which motivated them to say something in the first place but that can pollute the message. Being shocking is useful to catch people's attention but especially once the dialogue is going you need to cut it out ASAP. But if we need to rely on shock to cut through noise then ideally you don't rely on that but rather find a way to make the environment less noisy so that consensus is enough to make the right conversation happen.
Very Good Presentation!
😒 "Promo sm"
Great talk!
Thank you for this very helpful overview!