Consent in Crisis: The Rapid Decline of the AI Data Commons - Nov 14, 2024
ฝัง
- เผยแพร่เมื่อ 25 ม.ค. 2025
- Our second session of the AI Reading Group featured the paper "Consent in Crisis: The Rapid Decline of the AI Data Commons" from the Data Provenance Initiative, which explores how rapid shifts in web consent mechanisms are fundamentally altering the landscape for AI data collection. The paper presents a large-scale audit of web sources used in AI training, revealing an escalating trend of websites restricting access to their data, with over 45% of key web sources for AI now imposing restrictions. These developments could profoundly impact the future of AI development by limiting access to diverse and fresh data-essential for training robust AI models.
Paper link: arxiv.org/abs/...
About the authors:
Shayne Longpre’s research is at the intersection of AI and policy: responsibly training, evaluating, and governing general-purpose AI systems. He leads the Data Provenance Initiative, led the Open Letter on A Safe Harbor for Independent AI Evaluation & Red Teaming, and has contributed to training models like Bloom, Aya, and Flan-T5/PaLM. His work has received Best Paper Awards from ACL 2024, and NAACL 2024, as well as coverage by the NYT, Washington Post, Atlantic, 404 Media, Vox, and MIT Tech Review.
Ariel N Lee is a generative AI research scientist specializing in multimodal and large language models. At Raive, she is developing multimedia foundation models with IP attribution. Her work includes co-leading the Platypus LLM project, which achieved state-of-the-art performance in open-source models, and contributing to the Data Provenance Initiative, analyzing AI data access restrictions. With an M.Sc. in electrical and computer engineering from Boston University, she focuses on efficient model refinement, data quality, diffusion, and open-source AI development. She is committed to advancing AI through collaborative research that addresses both technical innovations and ethical considerations.
Campbell Lund is a graduate of Wellesley College where she received her BS in Computer Science. Her research interests lie in how technology intersects with society, and she has been collaborating with the Data Provenance Initiative to conduct large-scale audits of the data used to train large language models. Campbell is currently pursuing her MSc in Data and Artificial Intelligence Ethics at the Edinburgh Futures Institute where she plans to continue exploring themes of algorithmic justice and advocating for human-in-the-loop approaches to the development and deployment of AI.