Reading AI's Mind - Mechanistic Interpretability Explained [Anthropic Research]

AI Learns What Pizza Is

We Were Right! Real Inner Misalignment

ถ้าพี่วุ้นท้อง แป้งจะท้องไปด้วยกัน !!!! [ดูคลิปเต็มหน้าช่อง] #Short | PANG ORNHIRA

[MPD직캠] 베이비몬스터 치키타 직캠 4K 'LIKE THAT' (BABYMONSTER CHIQUITA FanCam) | @MCOUNTDOWN_2024.6.13

วิธีแก้ปัญหาตอนลืมกุญแจบ้าน #บ้านกูเอง #ตลก #ละครสั้น

What is mechanistic interpretability? Neel Nanda explains.

AXRP

มุมมอง 3 503

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 21 ก.พ. 2023
Art by ‪@hamishdoodles‬
Clipped from episode 19 of AXRP: • 19 - Mechanistic Inter...
Transcript of that episode: axrp.net/episode/2023/02/04/e...
---
AXRP patreon: / axrpodcast
AXRP ko-fi: ko-fi.com/axrpodcast
วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 7

@antigonemerlin ปีที่แล้ว ⁺⁴
The images are quite helpful, especially for a complete beginner to the field when it comes to terms like stochastic descent. This channel is very underrated.
@axrpodcast ปีที่แล้ว ⁺¹
Thanks - nice to hear!
@Words-. 6 หลายเดือนก่อน
Thank you!
@Words-. 6 หลายเดือนก่อน
What if we have an AI that does this for us? And an ai that interprets the interpreter and so on. Maybe an ai wave process in order to give us a constant state of interpretation of what is going on.
@reidelliot1972 3 หลายเดือนก่อน ⁺¹
There are approaches that use this tactic for outer alignment. I highly recommend checking out the classics: Christiano IDA and debate, etc. It's definitely a common motif in this area of research. But then again, I've seen people raise concerns that automating interpretability tools may enable deceptively aligned policies/agents to further entrench themselves.
Check out "AGI-Automated Interpretability is Suicide" by RicG
@user-vt4bz2vl6j 2 หลายเดือนก่อน
thats great but how would you know its doing it correctly...
@Words-. 2 หลายเดือนก่อน
@@user-vt4bz2vl6j That is a fair question, idk. But at least its a step

ต่อไป

เล่นอัตโนมัติ

Reading AI's Mind - Mechanistic Interpretability Explained [Anthropic Research]

Reading AI's Mind - Mechanistic Interpretability Explained [Anthropic Research]

AI Learns What Pizza Is

AI Learns What Pizza Is

We Were Right! Real Inner Misalignment

We Were Right! Real Inner Misalignment

ถ้าพี่วุ้นท้อง แป้งจะท้องไปด้วยกัน !!!! [ดูคลิปเต็มหน้าช่อง] #Short | PANG ORNHIRA

ถ้าพี่วุ้นท้อง แป้งจะท้องไปด้วยกัน !!!! [ดูคลิปเต็มหน้าช่อง] #Short | PANG ORNHIRA

[MPD직캠] 베이비몬스터 치키타 직캠 4K 'LIKE THAT' (BABYMONSTER CHIQUITA FanCam) | @MCOUNTDOWN_2024.6.13

[MPD직캠] 베이비몬스터 치키타 직캠 4K 'LIKE THAT' (BABYMONSTER CHIQUITA FanCam) | @MCOUNTDOWN_2024.6.13

วิธีแก้ปัญหาตอนลืมกุญแจบ้าน #บ้านกูเอง #ตลก #ละครสั้น

วิธีแก้ปัญหาตอนลืมกุญแจบ้าน #บ้านกูเอง #ตลก #ละครสั้น

World’s Deadliest Obstacle Course!

World’s Deadliest Obstacle Course!

Some Psychology on: The Fear Of Missing Out

Some Psychology on: The Fear Of Missing Out

stik infection part 1

stik infection part 1

I Made a Neural Network with just Redstone!

I Made a Neural Network with just Redstone!

What's a Tensor?

What's a Tensor?

Why do Convolutional Neural Networks work so well?

Why do Convolutional Neural Networks work so well?

Interesting language facts

Interesting language facts

Mechanistic Interpretability - NEEL NANDA (DeepMind)

Mechanistic Interpretability - NEEL NANDA (DeepMind)

How to Survive Your 20s

How to Survive Your 20s

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification

อยากรับเครื่องเอง ได้เร็วสุดภายในกี่ชั่วโมง ?

อยากรับเครื่องเอง ได้เร็วสุดภายในกี่ชั่วโมง ?

Процессоры Apple M - всё. RISC был оправдан?

Процессоры Apple M — всё. RISC был оправдан?

BossPC เครมไวจัด #ร้านคอมโคราช #bosspc

BossPC เครมไวจัด #ร้านคอมโคราช #bosspc

New Innovation - 6 💥 #new #innovation #unique #tech #shorts

New Innovation - 6 💥 #new #innovation #unique #tech #shorts

Will the battery emit smoke if it rotates rapidly?

Will the battery emit smoke if it rotates rapidly?

iOS 18! ลองฟีเจอร์ใหม่?

iOS 18! ลองฟีเจอร์ใหม่?

iPhone 15 vs POCO X6 PRO - FREEFIRE DAMAGE TEST #freefire #pocox6pro #iphone15 #90fps

iPhone 15 vs POCO X6 PRO - FREEFIRE DAMAGE TEST #freefire #pocox6pro #iphone15 #90fps

ทำคลิปใส่ตัวหนังสือชื่อเมนูกาแฟเท่ๆ #iphoneiosthailand #Apple #iPhone #apple #ทริคดีๆ #รอบรู้ไอที

ทำคลิปใส่ตัวหนังสือชื่อเมนูกาแฟเท่ๆ #iphoneiosthailand #Apple #iPhone #apple #ทริคดีๆ #รอบรู้ไอที