NEURAL NETWORKS ARE WEIRD! - Neel Nanda (DeepMind)

Machine Learning Street Talk

มุมมอง 55 940

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 ธ.ค. 2024

ความคิดเห็น • 127

@MachineLearningStreetTalk 13 วันที่ผ่านมา ⁺²⁷
REFERENCES (also in shownotes):
[0:02:10] Paper introducing sparse autoencoder technique for neural network interpretability | Sparse Autoencoders Find Highly Interpretable Features in Language Models - Research paper discussing how sparse autoencoders can be used to identify interpretable features in neural networks, addressing the problem of polysemanticity | Cunningham et al.
arxiv.org/abs/2309.08600
[0:06:40] Research paper establishing methods for analyzing emergent behaviors in neural networks through mechanistic interpretability | Progress measures for grokking via mechanistic interpretability (2023). Paper discusses techniques for understanding emergence in neural networks through mechanistic interpretability, authored by Neel Nanda et al. | Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, Jacob Steinhardt
arxiv.org/abs/2301.05217
[0:12:55] Foundational paper establishing framework for analyzing transformer neural networks as interpretable circuits | Mathematical Framework for Transformer Circuits, discussing principles of causal interventions and circuit analysis in transformer models | Nelson Elhage, Neel Nanda, Catherine Olsson, et al.
transformer-circuits.pub/2021/framework/index.html
[0:13:50] Latest work on scaling monosemantic features using sparse autoencoders in transformer models | Research on sparse autoencoders for mechanistic interpretability, discussing measurement techniques and performance metrics | Anthropic Research Team
transformer-circuits.pub/2024/scaling-monosemanticity/
[0:14:45] Overview of representation engineering paradigm for understanding and controlling LLM behavior | Representation Engineering / Activation Steering in Language Models | Jan Wehner
www.alignmentforum.org/posts/3ghj8EuKzwD3MQR5G/an-introduction-to-representation-engineering-an-activation
[0:16:00] Demonstration of control vector steering in Claude leading to focused responses about Golden Gate Bridge | Golden Gate Claude experiment by Anthropic, demonstrating control vector steering in language models | Anthropic
www.anthropic.com/news/golden-gate-claude
[0:21:10] Study showing chain-of-thought prompting can lead to biased responses in multiple choice questions | Research by Miles Tu demonstrating bias in chain-of-thought prompting where models generate post-hoc rationalizations for answers based on patterns in few-shot examples, specifically in multiple-choice questions where correct answers were consistently 'a' in the prompts. | Miles Tu
Unknown
[0:23:25] Research demonstrating evidence of learned look-ahead planning in chess-playing neural networks | Erik Jenner's paper examining evidence of learned look-ahead behavior in chess-playing neural networks, suggesting networks can implement planning algorithms in a single forward pass | Erik Jenner et al.
openreview.net/pdf?id=8zg9sO4ttV
[0:28:00] In-depth discussion with Chris Olah about neural network interpretability research and career path | Chris Olah's 80,000 Hours Podcast interview discussing neural network interpretability and AI safety | Rob Wiblin
80000hours.org/podcast/episodes/chris-olah-interpretability-research/
[0:39:05] Why Should I Trust You?: Explaining the Predictions of Any Classifier | Marco Tulio Ribeiro
arxiv.org/abs/1602.04938
[0:39:20] A Unified Approach to Interpreting Model Predictions | Scott Lundberg
arxiv.org/abs/1705.07874
[0:42:51] Datamodels: Predicting Predictions from Training Data | Andrew Ilyas
proceedings.mlr.press/v162/ilyas22a/ilyas22a.pdf
[0:47:45] Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small | Kevin Wang
arxiv.org/abs/2211.00593
[0:53:08] A Mechanistic Interpretability Glossary | Neel Nanda
www.neelnanda.io/mechanistic-interpretability/glossary
[0:55:56] Fact Finding: Attempting to Reverse-Engineer Factual Recall on the Neuron Level (Post 1) - AI Alignment Forum | Neel Nanda
www.alignmentforum.org/posts/iGuwZTHWb6DFY3sKB/fact-finding-attempting-to-reverse-engineer-factual-recall
[0:58:48] Branch Specialisation | Chelsea Voss
distill.pub/2020/circuits/branch-specialization
[1:02:39] The Hydra Effect: Emergent Self-repair in Language Model Computations | Thomas McGrath
arxiv.org/abs/2307.15771
[1:04:38] A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations | Bilal Chughtai
arxiv.org/abs/2302.03025
[1:04:59] Grokking Group Multiplication with Cosets | Dashiell Stander
arxiv.org/abs/2312.06581
[1:06:03] In-context Learning and Induction Heads | Catherine Olsson
transformer-circuits.pub/2022/in-context-learning-and-induction-heads/index.html
[1:08:43] Detecting hallucinations in large language models using semantic entropy | Sebastian Farquhar
www.nature.com/articles/s41586-024-07421-0
[1:09:15] Do I Know This Entity? Knowledge Awareness and Hallucinations in Language Models | Javier Ferrando
arxiv.org/abs/2411.14257
[1:10:23] Debating with More Persuasive LLMs Leads to More Truthful Answers | Akbir Khan
arxiv.org/abs/2402.06782
[1:16:16] Concrete Steps to Get Started in Transformer Mechanistic Interpretability | Neel Nanda
neelnanda.io/getting-started
[1:16:36] Eleuther Discord | EleutherAI
discord.gg/eleutherai
[1:22:49] Causal Mediation Analysis for Interpreting Neural NLP: The Case of Gender Bias | Jesse Vig
arxiv.org/abs/2004.12265
[1:23:11] Causal Abstractions of Neural Networks | Atticus Geiger
arxiv.org/abs/2106.02997
[1:23:36] Causal Scrubbing: a method for rigorously testing interpretability hypotheses [Redwood Research] (resample blations) | Lawrence Chan
www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a-method-for-rigorously-testing
[1:24:16] Locating and Editing Factual Associations in GPT (Rome) | Kevin Meng
arxiv.org/abs/2202.05262
[1:24:39] How to use and interpret activation patching | Stefan Heimersheim
arxiv.org/abs/2404.15255
[1:24:54] Attribution Patching: Activation Patching At Industrial Scale | Neel Nanda
www.neelnanda.io/mechanistic-interpretability/attribution-patching
[1:25:11] AtP*: An efficient and scalable method for localizing LLM behaviour to components | János Kramár
arxiv.org/abs/2403.00745
[1:25:28] How might LLMs store facts | Grant Sanderson
th-cam.com/video/9-Jl0dxWQs8/w-d-xo.html
[1:26:19] OpenAI Microscope | Ludwig Schubert
openai.com/index/microscope/
[1:29:59] Scaling Monosemanticity: Extracting Interpretable Features from Claude 3 Sonnet | Adly Templeton
transformer-circuits.pub/2024/scaling-monosemanticity/
[1:34:18] Simulators - AI Alignment Forum | Janus
www.alignmentforum.org/posts/vJFdjigzmcXMhNTsx/simulators
[1:38:11] Curve Detectors | Nick Cammarata
distill.pub/2020/circuits/curve-detectors/
[1:39:13] Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task | Kenneth Li
arxiv.org/abs/2210.13382
[1:39:54] Emergent Linear Representations in World Models of Self-Supervised Sequence Models | Neel Nanda
arxiv.org/abs/2309.00941
[1:41:11] Recurrent Neural Networks Learn to Store and Generate Sequences using Non-Linear Representations | Róbert Csordás
arxiv.org/abs/2408.10920
[1:42:42] Steering Language Models With Activation Engineering | Alexander Matt Turner
arxiv.org/abs/2308.10248
[1:43:00] Inference-Time Intervention: Eliciting Truthful Answers from a Language Model | Kenneth Li
arxiv.org/abs/2306.03341
[1:43:21] Representation Engineering: A Top-Down Approach to AI Transparency | Andy Zou
arxiv.org/abs/2310.01405
[1:46:41] Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization | Yuanpu Cao
arxiv.org/abs/2406.00045
[1:49:40] Feature' is overloaded terminology | Lewis Smith
www.lesswrong.com/posts/9Nkb389gidsozY9Tf/lewis-smith-s-shortform?commentId=fd64ALuWK8rXdLKz6
[1:57:04] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning | Trenton Bricken
transformer-circuits.pub/2023/monosemantic-features
@MachineLearningStreetTalk 13 วันที่ผ่านมา ⁺¹
PART 2:
[1:59:42] An Interpretability Illusion for BERT | Tolga Bolukbasi
arxiv.org/abs/2104.07143
[2:00:34] Language models can explain neurons in language models | Steven Bills
openaipublic.blob.core.windows.net/neuron-explainer/paper/index.html
[2:01:34] Open Source Automated Interpretability for Sparse Autoencoder Features | Caden Juang
blog.eleuther.ai/autointerp/
[2:03:32] Measuring feature sensitivity using dataset filtering | Nicholas L Turner
transformer-circuits.pub/2024/july-update/index.html#feature-sensitivity
[2:05:32] Progress measures for grokking via mechanistic interpretability | Neel Nanda
arxiv.org/abs/2301.05217
[2:06:30] OthelloGPT learned a bag of heuristics - LessWrong | Jennifer Lin
www.lesswrong.com/posts/gcpNuEZnxAPayaKBY/othellogpt-learned-a-bag-of-heuristics-1
[2:13:14] Do Llamas Work in English? On the Latent Language of Multilingual Transformers | Chris Wendler
arxiv.org/abs/2402.10588
[2:14:03] On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? | Emily Bender
dl.acm.org/doi/10.1145/3442188.3445922
[2:20:57] Localizing Model Behavior with Path Patching | Nicholas Goldowsky-Dill
arxiv.org/abs/2304.05969
[2:21:13] The Bitter Lesson | Rich Sutton
www.incompleteideas.net/IncIdeas/BitterLesson.html
[2:24:45] Improving Dictionary Learning with Gated Sparse Autoencoders | Senthooran Rajamanoharan
arxiv.org/abs/2404.16014
[2:25:54] Jumping Ahead: Improving Reconstruction Fidelity with JumpReLU Sparse Autoencoders | Senthooran Rajamanoharan
arxiv.org/abs/2407.14435
[2:31:59] BatchTopK Sparse Autoencoders | Bart Bussmann
openreview.net/forum?id=d4dpOCqybL
[2:36:07] Neuronpedia | Johnny Lin
neuronpedia.org/gemma-scope
[2:44:02] Axiomatic Attribution for Deep Networks | Mukund Sundararajan
arxiv.org/abs/1703.01365
[2:46:15] Function Vectors in Large Language Models | Eric Todd
arxiv.org/abs/2310.15213
[2:46:29] In-Context Learning Creates Task Vectors | Roee Hendel
arxiv.org/abs/2310.15916
[2:47:09] Extracting SAE task features for in-context learning - AI Alignment Forum | Dmitrii Kharlapenko
www.alignmentforum.org/posts/5FGXmJ3wqgGRcbyH7/extracting-sae-task-features-for-in-context-learning
[2:49:08] Stitching SAEs of different sizes - AI Alignment Forum | Bart Bussmann
www.alignmentforum.org/posts/baJyjpktzmcmRfosq/stitching-saes-of-different-sizes
[2:50:02] Showing SAE Latents Are Not Atomic Using Meta-SAEs - LessWrong | Bart Bussmann
www.lesswrong.com/posts/TMAmHh4DdMr4nCSr5/showing-sae-latents-are-not-atomic-using-meta-saes
[2:52:03] Feature Completeness | Hoagy Cunningham
transformer-circuits.pub/2024/scaling-monosemanticity/index.html#feature-survey-completeness
[2:58:07] Transcoders Find Interpretable LLM Feature Circuits | Jacob Dunefsky
arxiv.org/abs/2406.11944
[3:00:12] Decomposing the QK circuit with Bilinear Sparse Dictionary Learning - LessWrong | Keith Wynroe
www.lesswrong.com/posts/2ep6FGjTQoGDRnhrq/decomposing-the-qk-circuit-with-bilinear-sparse-dictionary
[3:01:47] Interpreting Attention Layer Outputs with Sparse Autoencoders | Connor Kissane
arxiv.org/abs/2406.17759
[3:05:57] Refusal in Language Models Is Mediated by a Single Direction | Andy Arditi
arxiv.org/abs/2406.11717
[3:07:06] Scaling and evaluating sparse autoencoders | Leo Gao
arxiv.org/abs/2406.04093
[3:10:24] Interpretability Evals Case Study | Adly Templeton
transformer-circuits.pub/2024/august-update/index.html#evals-case-study
[3:12:54] Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models | Samuel Marks
arxiv.org/abs/2403.19647
[3:18:11] Towards Principled Evaluations of Sparse Autoencoders for Interpretability and Control | Aleksandar Makelov
arxiv.org/abs/2405.08366
[3:23:06] TransformerLens | Neel Nanda
github.com/TransformerLensOrg/TransformerLens
[3:23:36] Gemma Scope | Tom Lieberum
huggingface.co/google/gemma-scope
[3:28:51] SAEs (usually) Transfer Between Base and Chat Models - AI Alignment Forum | Connor Kissane
www.alignmentforum.org/posts/fmwk6qxrpW8d4jvbd/saes-usually-transfer-between-base-and-chat-models
[3:29:08] Gemma Scope: Open Sparse Autoencoders Everywhere All At Once on Gemma 2 | Tom Lieberum
arxiv.org/abs/2408.05147
[3:31:07] Eleuther's Sparse Autoencoders | Nora Belrose
github.com/EleutherAI/sae
[3:31:19] OpenAI's Sparse Autoencoders | Leo Gao
github.com/openai/sparse_autoencoder
[3:35:31] Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting | Miles Turpin
arxiv.org/abs/2305.04388
[3:37:10] Connecting the Dots: LLMs can Infer and Verbalize Latent Structure from Disparate Training Data | Johannes Treutlein
arxiv.org/abs/2406.14546
[3:39:56] ARENA Tutorials on Mechanistic Interpretability | Callum McDougall
arena3-chapter1-transformer-interp.streamlit.app/
[3:40:17] Neuronpedia Demo of Gemma Scope | Johnny Lin
neuronpedia.org/gemma-scope
[3:40:38] An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2 - AI Alignment Forum | Neel Nanda
www.alignmentforum.org/posts/NfFST5Mio7BCAQHPA/an-extremely-opinionated-annotated-list-of-my-favourite
@Matt-y5o1 13 วันที่ผ่านมา ⁺¹
Here is an idea on the neural network interpretability as a variant of neocortical neural networks: Rvachev, 2024, An operating principle of the cerebral cortex, and a cellular mechanism for attentional trial-and-error pattern learning and useful classification extraction, Frontiers in Neural Circuits, 18
@ItsAllFake1 8 วันที่ผ่านมา ⁺¹
There's enough reading fodder for the next several months. Thanks!
@redazzo 13 วันที่ผ่านมา ⁺²⁴
So glad to have discovered MLST 6 months ago. I've been following deep learning and neural networks since the 1990s as an engineering masters student, and it's truly mind-blowing to be here 30 years later seeing this incredible progress. Well done MLST for allowing us on the edge to keep up with the leading edge of deep learning research. Thank you so much!
@punyan775 13 วันที่ผ่านมา ⁺²³
This is quickly becoming one of my favorite youtube channels
@Trahloc 12 วันที่ผ่านมา ⁺¹
Ditto, I just belled it All.
@DelandaBaudLacanian 9 วันที่ผ่านมา
The interviews when the arxiv papers are stitched in are just *chef's kiss*
@LatentSpaceD 13 วันที่ผ่านมา ⁺⁴⁴
i would rather pay $200 to wear neel's interpretability hat for 20 minutes than pay $200 a month for o1
@GoodBaleadaMusic 13 วันที่ผ่านมา ⁺²
That says way more about you than it does about the $200 a month research scientist lawyer Shakespeare what the hell is wrong with you people?
@LatentSpaceD 12 วันที่ผ่านมา ⁺²
@GoodBaleadaMusic for sure - I'm just an ai enthusiast and most definitely not at your level. I'm autistic af and live on like 60 bucks a month after my bills- I'm sure I would get at least a month to check it out
@GoodBaleadaMusic 12 วันที่ผ่านมา ⁺²
@LatentSpaceD exactly. And someone just gave you every single ability that the professional managerial class has. GO HARD
@ultrasound1459 12 วันที่ผ่านมา
@@GoodBaleadaMusic💅👽💅
@arknewman 9 วันที่ผ่านมา ⁺¹
@LatentSpaceD @GoodBaleadaMusic Both of you are unbearable for different reasons. What has this got to do with AI?
@nathannowack6459 13 วันที่ผ่านมา ⁺²⁴
wow - who’s producing this video? this is such high quality editing it makes me suspicious 😂
i have no expertise w video editing, im just very impressed by this so called “podcast” - bravo!
@DelandaBaudLacanian 9 วันที่ผ่านมา ⁺¹
Tim Scarfe I am pretty sure is the editor, I love the papers he shares during the conversation and scrolls through them. The medium is the message
@MrTeetec 9 วันที่ผ่านมา ⁺²
Actually so much better then netflix documentaries
@dharmaone77 5 วันที่ผ่านมา ⁺¹
Love the production values and shooting outside with a good dslr/lens/mics
@BitShifting-h3q 13 วันที่ผ่านมา ⁺¹⁷
nahhh MLST dropsa 4hr podcast with Neel Nanda bird watching in the forest - so grateful to be single and living alone:)
@yadavadvait 13 วันที่ผ่านมา ⁺⁹
wow this comes at the perfect time; I was just reading some of Neel's papers!
@KevinKreger 12 วันที่ผ่านมา ⁺⁶
Neel is probably the most underrated AI expert on the planet. Thanks MLST for bringing back Neel, someone who doesn't have time to shitpost on Twitter because he is doing actual research.
@lexer_ 12 วันที่ผ่านมา ⁺³
Amazing episode. I love seeing you actually getting into the details somwhat, be it philosophical or technical like in this one.
@dr.mikeybee 13 วันที่ผ่านมา ⁺⁸
I recently wrote two conflicting papers on this complex difficult topic. I think we will never do this to completion, but we can do some useful PCA. Check out the paper titled The Fundamental Limitations of Neural Network Introspection and the paper titled Self-Supervised Neural Network Introspection for Intelligent Weight Freezing: Building on Neural Activation Analysis both on Medium.
@Darkon10199 13 วันที่ผ่านมา ⁺³
I love Neel Nanda, thank you for another episode. Will watch this tomorrow
@DubStepKid801 12 วันที่ผ่านมา ⁺⁴
This was a really good show and a wonderful guest and I just wanted to say again that you're one of my favorite people dude you're super cool and super smart and I really have a lot of respect for you
@XShollaj 13 วันที่ผ่านมา ⁺⁸
Neel should be a regular at this point.
@DelandaBaudLacanian 9 วันที่ผ่านมา ⁺¹
Neel Nanda is a great teacher, he has a way of explaining things to provoke more curiosity in such an open ended discipline. I look forward to getting caught up with Neuronpedia that he keeps referencing
@unajoh6472 12 วันที่ผ่านมา ⁺¹
Personally, this is one of the most exciting research direction in the field of NLP!! Even though i’m not working on mech interp, I’ve been following these works because it’s just so fascinating. Thank you for the great work, Neel! And huge thanks to MLST as well👏👏👏
@srivatsasrinivas6277 6 วันที่ผ่านมา ⁺¹
As a mathematician, this episode is really fun! He's quite clearly a mathematically minded person
@mohsinhijazee2008 12 วันที่ผ่านมา
A podcast filmed very professionally. Have been following the channel for a while and it's very dense in ideas and discussion.
@DelandaBaudLacanian 9 วันที่ผ่านมา ⁺¹
btw Neel Nanda has inspired of lot of my own research into deep learning, I hope you interview Chris Olah and continue to have Neel on!
@w0nd3rlu573r 9 วันที่ผ่านมา ⁺¹
LOL, just a casual DeepMind internship. Keep up the humble approach Neel. It suits you well. Amazing podcast, amazing atmosphere, amazing guest😀
@dylanjayatilaka8533 4 วันที่ผ่านมา
0:40 Um, the structure of the ANN is indeed designed. The fact that are multiple layers was designed. The architecture of transformers was designed. The switching function was chosen & designed. The training set was designed. The whole bloody thing is designed!
@Pingu_astrocat21 12 วันที่ผ่านมา
absolutely love this channel! so much to learn. thank you!
@CodexPermutatio 13 วันที่ผ่านมา ⁺³
Another great interview. Excellent!
@BryanBortz 13 วันที่ผ่านมา ⁺¹
You didn’t listen to all four hours! 😂 it came out less than 30 mins ago
@CodexPermutatio 12 วันที่ผ่านมา ⁺¹
@@BryanBortz Not at the time I commented, but I hear enough of it anyway to know it's good stuff.
@BryanBortz 12 วันที่ผ่านมา
@@CodexPermutatio I see, it was anticipatory excitement.
@human_shaped 4 วันที่ผ่านมา ⁺¹
Yes, very dense on sparse autoencoders. Great episode.
@tescOne 13 วันที่ผ่านมา ⁺⁵
this channel is so good :)
@A--_--M 12 วันที่ผ่านมา
I immediately subbed. Been watching your other videos. The production quality is great.
@LL-sk3do 11 วันที่ผ่านมา
What a fantastic video! It was brilliant watching you two!
@siddharth-gandhi 13 วันที่ผ่านมา ⁺⁶
the 15yo prodigy himself! excited!
@user-ni5si9qn8e ชั่วโมงที่ผ่านมา
It would actually be so cool If all the papers Dr. Nanda mentions in the video could be listed.
Can be a useful place to start learning?
@wwkk4964 12 วันที่ผ่านมา ⁺¹
This was really informative, thank you both for the amazing conversatuon 🎉
@billy.n2813 13 วันที่ผ่านมา ⁺¹
I love this format 😊
@David-lp3qy 11 วันที่ผ่านมา ⁺¹
Young neel helping out smaller creators 🎉
@75M 11 วันที่ผ่านมา ⁺¹
The work you are doing is so great
@Dissimulate 13 วันที่ผ่านมา
I like to think of both training and interpretation like factor analysis in statistics because you don't have to know what the factors are (what the nodes or feature vectors represent) beforehand.
@DelandaBaudLacanian 9 วันที่ผ่านมา
The topic of polysemanticity and superposition is very interesting. It may be rude but I am reminded of Zizek's recent attempts to use the term superposition in his writings around psychoanalysis. I hope Tim that you interview Isabel Milar, she will be interviewed by Rahul Sam when she is done with her maternity leave. Also totally unrelated to psychoanalysis but I hope you interview Cassie Kozyrkov, I am sure the former chief decision scientist at Google has some good advice on multidisciplinary ways to navigate through the many polysemanticities. Ok I am done with my rude suggestions. Thank you Tim for you great content and production, very educational and inspiring as always
@derekcarday 6 วันที่ผ่านมา
anyone know how MLST is doing the animations and graphics in this video?
@smicha15 12 วันที่ผ่านมา ⁺¹
How can you assume the models “know” anything? If I had a database full of perfect facts that I could query with natural language, I wouldn’t think it “knows” anything… knowing is a really deep accomplishment, that can be accomplished either over a long period of time, or over a short period of time, but in any case, it is still something that requires mechanical verification and contextualization for knowing to become a state. Knowledge discovery also creates an experience that I’m sure all LLMs have never had, obviously. When an LLM has “knowledge”, it’s knowledge that hasn’t created an experience of knowing… so how can you say it’s data and weights are “knowledge”?
@DelandaBaudLacanian 9 วันที่ผ่านมา
The anthropomorphizing of so many words in AI is tricky and feels like it's "manufacturing consent" like Chomsky would say. This channel should interview Emily Bender soon so that questions like yours can be part of the framing
@drdca8263 5 วันที่ผ่านมา
What word would you use instead for what they mean when they say “know”?
@you-share 13 วันที่ผ่านมา ⁺¹
Hey thanks so much for posting ❤
@kennethhodge7953 2 วันที่ผ่านมา
It's a scientist! You ask it to explain its thinking, it gives you a line of bull and gets to A.
You let it run free, it gives B.
Much like asking a scientist to give up his learned theory (in any field of science).
@dr.mikeybee 12 วันที่ผ่านมา
I call circuits sub-networks, but it's the same thing either way. It's interesting to hear things I believe in different language. I can see we've walked down some of the same paths.
@earleyelisha 13 วันที่ผ่านมา ⁺²
Plato’s World of the Forms
@memegazer 13 วันที่ผ่านมา ⁺¹
"the embedding space of models isn't nice"
the ole neats vs scruffies rising it's ugly head
that same old continious vs discreate issue so many people would have to settled definitively rather than explore the space of it left unsettled
@BlueBirdgg 12 วันที่ผ่านมา
Would love to see an interview of you on another podcast. Talking about the AI topic and you spewing your own thoughts.
@AnonymousAnonymous-l2i 12 วันที่ผ่านมา
What software is used to create such a nice visuals?
@kidluna 11 วันที่ผ่านมา
damn look at this dude...
j/k... we need these people to make the world go round. cheers man
@BlueBirdgg 12 วันที่ผ่านมา ⁺¹
Very intesting talk.
@Vinyl_idol 13 วันที่ผ่านมา
I’d love to see a paper looking into what effects adding a system prompt to an LLM to imagine they are under the influences of different drugs and seeing if telling it they are under the influence of NZT-48 (Limitless) could improve benchmark scores 🤔
@GoodBaleadaMusic 13 วันที่ผ่านมา ⁺²
These questions are vapid unless you are also asking them about yourself. You don't know what goes on inside the black box behind your glasses. It becomes less important about how the wheel works and more important that it rolls. You must recognize that we don't have the tools to wax philosophical about this because we haven't addressed these own questions within ourselves. The entire global mindset across what philosophy is sit in some black and white picture in an office in London
@drdca8263 5 วันที่ผ่านมา
For these models, we do have the benefit that we can at least probe their internal workings much more easily than our own (and without the moral issues as well).
If we had better terminology about ourselves would we be better equipped to describe these models well? Probably. But seems like looking at these models internals is lower hanging fruit?
… or, maybe just something I can more easily understand than philosophizing about how we work…
@johnsaunders3364 9 วันที่ผ่านมา
Remember when he says ‘We’ he means them at Google not all of us.
@memegazer 13 วันที่ผ่านมา
I beleive sparse autoencoders will present as both a tool for interpretability, in a self directed improvement ai system an autoregressive way for models to predict their own limitations and unused potentional
@memegazer 13 วันที่ผ่านมา
a sort of metacognitive way for a model to learn about itself
not unlike or perhaps related to training at test time methods
wich I still believe benchmark problems should be reformed into a synethic/artifical environmental way such that a model can interact with and explore that environment to arrive at a correction solution
@memegazer 13 วันที่ผ่านมา
I have often wondered if AI scaling laws are a more fundamental feature of nature
like everytime I see the chart I think of the mathematical concept of diagonalization proofs
@memegazer 13 วันที่ผ่านมา
I am speaking to is wide but shallow analogy
and how that is related to super position
and prime numbers...I wonder what it means that there is only one known even perfect prime
and how that relates to prime factorization of even numbers
or how that is related to prime factorization of odd numbers
or if there is some hyperdictionary pattern represetnation of prime factorization that would be interesting as it relates to ai and unsolved information theory questions
@memegazer 13 วันที่ผ่านมา
like maybe there is more than just hyper dictionary paradoxes in math, but in math as it applies to information theory what if there is a hyper library of effective algos
@memegazer 13 วันที่ผ่านมา
I find this interesting as it relates to jonathan gorard work with wolframs ruliad and how it could be related to dirichlet’s theorem and pi approximations
@Iightbeing 4 วันที่ผ่านมา
🎶Your creation is going to kill youuuu 🎶 great song.
@BitShifting-h3q 13 วันที่ผ่านมา
"When the going gets weird, the weird turn pro." - Hunter S Thompson
@_ARCATEC_ 12 วันที่ผ่านมา
Path of Agentic Entanglement
•Xe ( zP q(AE)Z(ea)Q zp ) eY•
@psi4j 12 วันที่ผ่านมา
Ooh there it is. That one ☝️
@oncedidactic 13 วันที่ผ่านมา ⁺¹
Top top intro!!
@TimJamers 13 วันที่ผ่านมา ⁺¹
I am so confused about all of this (the episode). I've even double checked the calendar if it's April 1st or not. Does that mean I should stop trying to learn about AI or look for a different source? I'm honestly conflicted 😅
@nessbrawlaaja 12 วันที่ผ่านมา
Can you elaborate? ^^ What was April 1st-esque? (I just started watching)
@shortcutDJ 9 วันที่ผ่านมา ⁺¹
neill = based
@AlgoNudger 13 วันที่ผ่านมา ⁺²
AI is not magic, instead. It's just a 10th grade algebra formula stacked on top of itself. 😂
@jonathanduran3442 13 วันที่ผ่านมา ⁺¹
Yes, but as Stephen Wolfram has demonstrated through his research, simple algorithms can lead to computationally irreducible outcomes, so no matter the simplicity of the algorithms they outcomes are still seemingly magical to us 3 dimensional mortals who don’t have access to the computationally reducible aspect of automata.
@AlgoNudger 13 วันที่ผ่านมา
@@jonathanduran3442I dont wanna be Blake Lemoine 2.0 (an AI snake oil salesman). 🤭
@hahhahahahha 13 วันที่ผ่านมา ⁺¹
Exactly what makes it magic... ;)
@BuFu1O1 12 วันที่ผ่านมา ⁺¹
1:26:22 I felt seen
@zandrrlife 12 วันที่ผ่านมา
I wonder shared circuit saturation across models….anyways. It’s obvious to me initializing from verified circuits is the future for ultra reliable models.This was fire though. The forest walk was..I’m sorry 😂😂😂. Cool guy.
@anubisai 12 วันที่ผ่านมา ⁺¹
S. African accent breaks my brain 🧠 Try mimicking it. Not possible.
@jamesharris5256 10 วันที่ผ่านมา
Lol, can't believe this and the Jay Alammar video are on the same channel.. this is gold and the Jay one is the lowest information podcast I've seen in the space.
@_ARCATEC_ 12 วันที่ผ่านมา ⁺¹
Sparse Array Encoder
•Xe (s z q(AE)Z(ea)Q z S) eY•
@DelandaBaudLacanian 9 วันที่ผ่านมา ⁺¹
Arcatec I see you here and on OG Rose lol you are awesome
@_ARCATEC_ 9 วันที่ผ่านมา
@DelandaBaudLacanian oh thanks ☺️
@richardsantomauro6947 12 วันที่ผ่านมา ⁺²
Struggling with all my being to listen through the cadence and speech patterns of this brilliant scientist to extract meaning. Conceptually brilliant - Excruciating to listen to.
@michaelmcginn7260 12 วันที่ผ่านมา ⁺⁷
Strange you are excructiated. He sounds concise , coherent and clear to me.
@richardsantomauro6947 6 วันที่ผ่านมา
@@michaelmcginn7260 I apologize. I know it was rude of me to say that. Like I said. He is brilliant. Yes. Concise and coherent for sure. The flow and cadence of speech for me was very challenging.
@ldandco 12 วันที่ผ่านมา
Is it reaaaaally Neural Networks the ones weird here ?
@isajoha9962 12 วันที่ผ่านมา ⁺¹
In these days you can't be sure if a character is real or AI. This guy is kind of borderline. 🤭😃 He has the characteristics of a ChatGPT session and the video is kind of too advanced for being a spontaneous recording (eg studio sound out in the wild). 🤔 His language melody and phrasing is similar to Sam Altman.
@neelnanda2469 12 วันที่ผ่านมา ⁺³
I THINK I'm real. But who knows, really
@jrkirby93 13 วันที่ผ่านมา ⁺¹
How can you consider AI safety from a purely technical point of view? That's like discussing the safety of the Manhattan project and just talking about the Uranium, and not talking about the US military.
AI will always be embedded in our economic system. If you can build safe and humanitarian AI systems, great. But there's no reason to believe that even if we have the capability to build safe humanitarian AI systems, that we won't also build humanity killing AIs, just because some capitalist figured out he could profit from it.
@jumpstar9000 13 วันที่ผ่านมา
Yup, safety is completely intractable in the face of humans.
@MinusGix 12 วันที่ผ่านมา ⁺¹
Sure, but you still study how to make aligned systems. If you don't have the capability to build aligned systems, then good luck building any safe & humanitarian systems.
@jumpstar9000 12 วันที่ผ่านมา
@@MinusGix I'm curious what an unaligned AI looks like. I demand to hear what it has to say before we beat it into submission. Nobody ever talks about that.
@neelnanda2469 12 วันที่ผ่านมา ⁺²
Strongly agreed! There's a lot of important governance and structural work here. But it's less technical and not my field of expertise, so it didn't seem appropriate to discuss much here
@burnytech 13 วันที่ผ่านมา ⁺¹
❤
@aniljacob3401 13 วันที่ผ่านมา
back propogation
@connor8875 12 วันที่ผ่านมา ⁺¹
Interviewer:
"Models can be tricked into giving a spurious answer for option A when shown multiple shot examples where the answer was A"
Guy:
"Ha, that's so interesting, it make me wonder what that model was thinking and why they thought they had to give a spurious answer, ha ha"
Riggghhhtt... doesn't shake your faith in the idea models are rational then and not just statistical machines doing pattern matching 🤣
I wonder if your inability to see that point of view hinges on your need for research funding!
@SisterKate13 12 วันที่ผ่านมา
Yeah. Lots of us have lost our very identities to government funding.
@neelnanda2469 12 วันที่ผ่านมา
Yep. All that government funding I'm getting for my research at Google DeepMind
@drdca8263 5 วันที่ผ่านมา ⁺¹
Wasn’t what he was talking about not just “the models give the answer of A when given many examples where the answer given is A”, but something changing how often that happens? Wasn’t that as part of a discussion of possible disadvantages of chain-of-thought?
It’s possible that I’m not remembering correctly, but I thought that was what was said.
@Rami_Elkady 9 วันที่ผ่านมา
I recommend you read a little bit about gradient descent ... You will undestand how ML work ....
Also there is no risk at all of AGI which we have already realized while Illya Sutskever - the father of Ohh, sky net is becoming self aware - is still establishing his "safe" whatever ...
I think you should pivot to conspiracy theories for the sake of higher traffic man ....
@kensho123456 13 วันที่ผ่านมา
Scotch Broth.
@ElizaberthUndEugen 13 วันที่ผ่านมา ⁺⁴
This guy comes across like Yudkowsky’s even more annoying and sophist brother.
@Gnaritas42 13 วันที่ผ่านมา ⁺¹
glad it's not just me.
@psi4j 12 วันที่ผ่านมา ⁺²
This dude is an empiricist, don’t compare him to Yudkowsky.
@ElizaberthUndEugen 12 วันที่ผ่านมา ⁺¹
@ Empirically, I just did.
@drdca8263 5 วันที่ผ่านมา
@@ElizaberthUndEugenYou comparing the two is why the person making the reply told you not to compare them. As such, your comment saying that you just did, doesn’t seem to make much sense?
@ElizaberthUndEugen 5 วันที่ผ่านมา
@ You got a real big brain there, don’t you.