This Theory of Everything Could Actually Work: Wolfram’s Hypergraphs

Anthropic Caught Their Backdoored Models (Walkthrough)

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

'พิเชษฐ์' สุดทนลั่นไม่ต้องชี้หน้าอยากเป็นก็ขึ้นมา หลังเจอ 'ชลน่าน' สอนมวยหน้าที่ประธาน : Matichon TV

แยกให้ออก EP.3

[Sub Thai] APT. - ROSÉ & Bruno Mars

Anthropic Solved Interpretability Again? (Walkthrough)

The Inside View

มุมมอง 2 008

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ต.ค. 2024

ความคิดเห็น • 9

@christopherwoodall3464 5 หลายเดือนก่อน ⁺⁴
Great overview. Really enjoyed the fact that you showed previous work that was built upon.
@TheInsideView 5 หลายเดือนก่อน
Thanks! To be honest I only briefly mentioned their previous work and don't think I actually went through previous work in the literature (was just doing a walkthrough of their blogpost, still doing daily uploads), but I'll definitely consider this preference to discuss previous work for future videos
@drhxa 5 หลายเดือนก่อน ⁺⁴
That was an excellent walkthrough, thank you. I've learned a lot. Would love to see more walkthroughs of the prior/related work
@TheInsideView 5 หลายเดือนก่อน ⁺²
Thanks! My walkthrough of the previous Anthropic paper (prior work): th-cam.com/video/HAxd8DoZaW4/w-d-xo.html
For other interpretability papers I'd recommend checking out Neel Nanda's series of walkthroughs (he's actually leading a mechanistic interpretability team at DeepMind): th-cam.com/play/PL7m7hLIqA0hpsJYYhlt1WbHHgdfRLM2eY.html&si=tLqxLua5XZEdbyCy
@ThomasMeliWellness 5 หลายเดือนก่อน ⁺¹
Crystal clear. Thank you for sharing this. Subscribed!
@TheInsideView 5 หลายเดือนก่อน
Thanks! Tomorrow's video will be another walkthrough so hopefully worth the sub
@Alice_Fumo 5 หลายเดือนก่อน ⁺¹
This seems actually useful and has real-world applications.
It seems this allows for actually adjusting the personality of the model, so one could make it more adverse to writing code with bugs, more flirty, more honest or whatever. The big AI labs could adjust small details without needing to retrain the AI.
Also, I guess this could be done with open source models to figure out their "deny response" features and set them to very low values. It can be done with retraining, but that also just changes the model. Not needing such brute-force-y methods is neat.
@TheInsideView 5 หลายเดือนก่อน
Yeah exactly, that enables to steer them in the way that you'd prefer. If you haven't tried it yet I'd recommend checking out Golden Bridge Claude (which I talk about in the video) available on claude.ai for a limited time, which basically gives a concrete example of what having a custom steered LLM would be like.
@Alice_Fumo 5 หลายเดือนก่อน
@@TheInsideView I asked it to go one prompt without mentioning the bridge and tell me a bedtime story and it got extremely internally conflicted, retrying several times and wondering why it had such difficulty with this.
It's extremely interesting to witness. Thanks for notifying me that they were hosting that model, I didn't know.

ต่อไป

เล่นอัตโนมัติ

This Theory of Everything Could Actually Work: Wolfram’s Hypergraphs

This Theory of Everything Could Actually Work: Wolfram’s Hypergraphs

Anthropic Caught Their Backdoored Models (Walkthrough)

Anthropic Caught Their Backdoored Models (Walkthrough)

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

How large language models work, a visual intro to transformers | Chapter 5, Deep Learning

'พิเชษฐ์' สุดทนลั่นไม่ต้องชี้หน้าอยากเป็นก็ขึ้นมา หลังเจอ 'ชลน่าน' สอนมวยหน้าที่ประธาน : Matichon TV

'พิเชษฐ์' สุดทนลั่นไม่ต้องชี้หน้าอยากเป็นก็ขึ้นมา หลังเจอ 'ชลน่าน' สอนมวยหน้าที่ประธาน : Matichon TV

[Sub Thai] APT. - ROSÉ & Bruno Mars

[Sub Thai] APT. - ROSÉ & Bruno Mars

จะคว่ำได้มั้ย แก้วใหญ่ขนาดนี้‼️ #jamsai #แจ่มใส #jamsaijs

จะคว่ำได้มั้ย แก้วใหญ่ขนาดนี้‼️ #jamsai #แจ่มใส #jamsaijs

Generative Model That Won 2024 Nobel Prize

Generative Model That Won 2024 Nobel Prize

Claude Interpreter: Taking Safe AI to Market with Alex Albert of Anthropic

Claude Interpreter: Taking Safe AI to Market with Alex Albert of Anthropic

The Economics of AGI Automation

The Economics of AGI Automation

Towards Monosemanticity: Decomposing Language Models Into Understandable Components

Towards Monosemanticity: Decomposing Language Models Into Understandable Components

AI can't cross this line and we don't know why.

AI can't cross this line and we don't know why.

I poured all the galaxies in the Universe into a pool

I poured all the galaxies in the Universe into a pool

How to Justify the Safety of Advanced AI Systems? (Walkthrough)

How to Justify the Safety of Advanced AI Systems? (Walkthrough)

A Walkthrough of Toy Models of Superposition w/ Jess Smith

A Walkthrough of Toy Models of Superposition w/ Jess Smith

Scaling interpretability

Scaling interpretability

진 (Jin) 'I'll Be There' Official MV

진 (Jin) 'I'll Be There' Official MV

Family Love #funny #sigma

Family Love #funny #sigma

เที่ยวสามชายแดนภาคใต้โนแพลน...นราธิวาส Ep.สุดท้าย

เที่ยวสามชายแดนภาคใต้โนแพลน...นราธิวาส Ep.สุดท้าย

บาร์เซโลน่า 4-1 บาเยิร์น มิวนิค | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

บาร์เซโลน่า 4-1 บาเยิร์น มิวนิค | ไฮไลต์ ยูฟ่า แชมเปี้ยนส์ ลีก Champions League 24/25

Mom vs. Daughter ⚡😜 Hilarious Drawing Prank Challenge! #funny

Mom vs. Daughter ⚡😜 Hilarious Drawing Prank Challenge! #funny

ทำไมนักแข่งไม่ออกตัดเลือด? #rov #theped #เดอะเป็ด #shorts

ทำไมนักแข่งไม่ออกตัดเลือด? #rov #theped #เดอะเป็ด #shorts

สนธิ ฟาด ทนายตั้ม ฉ้อโกงเงิน 71 ล้าน l ข่าวเช้าเวิร์คพอยท์ l 26 ต.ค.67

สนธิ ฟาด ทนายตั้ม ฉ้อโกงเงิน 71 ล้าน l ข่าวเช้าเวิร์คพอยท์ l 26 ต.ค.67