"Comparing autonomous driving algorithms with the human drivers" - Kaylene Stocking, YRSS

"Robots that learn from imitation and reinforcement" - Matt Walter, Research at TTIC

Large Language Models (LLMs) - Everything You NEED To Know

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

"AI Safety Through Interpretable and Controllable Language Models" - Peter Hase, YRSS

TTIC

มุมมอง 39

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 26 ธ.ค. 2024
Originally presented on: Wednesday, November 20th, 2024 at 11:00am CT, TTIC, 6045 S. Kenwood Avenue, 5th Floor, Room 530
Title: "AI Safety Through Interpretable and Controllable Language Models"
Speaker: Peter Hase, University of North Carolina at Chapel Hill
Abstract: In a 2022 survey, 37% of NLP experts agreed that "AI decisions could cause nuclear-level catastrophe'' in this century. This survey was conducted prior to the release of ChatGPT. The research community’s now-common concern about catastrophic risks from AI highlights that long-standing problems in AI safety are as important as ever. In this talk, I will describe research on two core problems at the intersection of NLP and AI safety: (1) interpretability and (2) controllability. We need interpretability methods to verify that models use acceptable and generalizable reasoning to solve tasks. Controllability refers to our ability to steer individual behaviors in models on demand, which is helpful since pretrained models will need continual adjustment of specific knowledge and beliefs about the world. This talk will cover recent work on (1) open problems in interpretability, including mechanistic interpretability and chain-of-thought faithfulness, (2) fundamental problems with model editing, viewed through the lens of belief revision, and (3) scalable oversight, with a focus on weak-to-strong generalization. Together, these lines of research aim to develop rigorous technical foundations for ensuring the safety of increasingly capable AI systems.
Bio: Peter Hase is an AI Resident at Anthropic, working on the Alignment Science team. He recently completed his PhD at the University of North Carolina at Chapel Hill. His research focuses on NLP and AI Safety, with a particular emphasis on techniques for explaining and controlling model behavior. He has previously worked at AI2, Google, and Meta.
Timestamps:
00:00
00:05 Intro
00:43 Lecture
58:05 Q&A
#lm #languagemodel #artificialintelligence #ai #machinelearning #algorithm #computervision #robotics #research

ความคิดเห็น •

ต่อไป

เล่นอัตโนมัติ

"Comparing autonomous driving algorithms with the human drivers" - Kaylene Stocking, YRSS

"Comparing autonomous driving algorithms with the human drivers" - Kaylene Stocking, YRSS

"Robots that learn from imitation and reinforcement" - Matt Walter, Research at TTIC

"Robots that learn from imitation and reinforcement" - Matt Walter, Research at TTIC

Large Language Models (LLMs) - Everything You NEED To Know

Large Language Models (LLMs) - Everything You NEED To Know

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

แพนด้าจะไม่ทน #cartoon #cartoonnetwork #short

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

🔴LIVE โหนกระแส ศึกชิงมรดก 500 ล้าน ทายาทฟ้องเด็กรับใช้ปลอมลายเซ็น

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

หนูกับเต้ รัก ”พี่อู๋จูน“ นะ

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

总算是用上情侣手机壳了 #玩一种很新的东西 #手机壳 #情侣

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Generative AI in a Nutshell - how to survive and thrive in the age of AI

Popular Mechanistic Interpretability: Goodfire Lights the Way to AI Safety

Popular Mechanistic Interpretability: Goodfire Lights the Way to AI Safety

Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition

Geoffrey Hinton | On working with Ilya, choosing problems, and the power of intuition

2024's Biggest Breakthroughs in Computer Science

2024's Biggest Breakthroughs in Computer Science

The 8 AI Skills That Will Separate Winners From Losers in 2025

The 8 AI Skills That Will Separate Winners From Losers in 2025

2024's Biggest Breakthroughs in Math

2024's Biggest Breakthroughs in Math

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Stanford Webinar - Large Language Models Get the Hype, but Compound Systems Are the Future of AI

Transformers (how LLMs work) explained visually | DL5

Transformers (how LLMs work) explained visually | DL5

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

Andrew Ng Explores The Rise Of AI Agents And Agentic Reasoning | BUILD 2024 Keynote

🔴LIVE โหนกระแส บาร์โฮสสะเทือน!!! "สุนิสา" อาละวาดไล่หลอกเงิน

🔴LIVE โหนกระแส บาร์โฮสสะเทือน!!! "สุนิสา" อาละวาดไล่หลอกเงิน

How Strong Is Tape?

How Strong Is Tape?

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

HIGHLIGHTS : Singapore 2-4 Thailand | ASEAN Championship 2024 | 17.12.24

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

ใครคือฆาตกรตัวจริง ?! EP.11 (ver. คืนคริสมาสต์ สุดสยอง !!!

PiXXiE - Pick A Card | OFFICIAL M/V

PiXXiE - Pick A Card | OFFICIAL M/V

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

#อึ้ง!เหลือจะเชื่อ!ไทยพลิกนรกดับสิงคโปร์คาบ้าน ทะลุเข้ารอบรองชนะเลิศ! คารวะอิชิอิโคตรการเปลี่ยนแปลง!

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

ทัวร์สตรีมเมอร์ ROV ชิงเงินรางวัลรวม 25,000 บาท 8 ทีม : รอบ 8 ทีม

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67

Highlight | อัจฉริยะสาวไส้...เบื้องลึกเหตุยิง "สจ.โต้งปราจีนบุรี" | เปิดโต๊ะข่าว | 17 ธ.ค.67