Scaling interpretability

แชร์
ฝัง
  • เผยแพร่เมื่อ 12 มิ.ย. 2024
  • Science and engineering are inseparable. Our researchers reflect on the close relationship between scientific and engineering progress, and discuss the technical challenges they encountered in scaling our interpretability research to much larger AI models.
    Read more: anthropic.com/research/engine...

ความคิดเห็น • 68

  • @palimondo
    @palimondo หลายเดือนก่อน +61

    It’s win for humanity when quants quit finance to work on AI interpretability! Thank you 🙏

    • @df4privateyoutube722
      @df4privateyoutube722 หลายเดือนก่อน +1

      Honestly, this is amazing if it becomes a widespread trend.

  • @shawnvandever3917
    @shawnvandever3917 หลายเดือนก่อน +26

    More videos like this please !!!

  • @taiyoinoue
    @taiyoinoue หลายเดือนก่อน +11

    I absolutely love the results on interpretability discussed here. The Scaling Monosemanticity paper blew my mind, and I was raving about it to anyone who would listen. It is so wonderful to get the chance to see you all talk about this stuff. When I was a kid, I wanted to be one of those NASA engineers who sat in the command center doing calculations to explore outer space. Alas, I'm now a middle-aged pure mathematician. But now, if I was a kid, I'd want to be an AI interp researcher and do calculations to explore the space of possible minds.

  • @user-to9ub5xv7o
    @user-to9ub5xv7o หลายเดือนก่อน +10

    # Interpretability Engineering at Anthropic
    ## Chapter 1: Introductions and Background
    ****0:00**-**1:15****
    - Team members: Josh Batson, Jonathan Marcus, Adly, TC.
    - Backgrounds in finance, exchange learning, and backend work.
    ## Chapter 2: Recent Interpretability Release
    ****1:15**-**4:41****
    - Transition from small to large models.
    - Goal: Extract interpretable features from production models.
    ## Chapter 3: Discoveries and Features
    ****4:41**-**8:16****
    - Examples: Functions that add numbers, veganism feature.
    - Multimodal features: code backdoors, hidden cameras.
    ## Chapter 4: Golden Gate Claude Experiment
    ****8:16**-**10:42****
    - Experiment: Claude responding with Golden Gate Bridge information.
    - Rapid implementation and success.
    ## Chapter 5: Scaling Challenges
    ****10:42**-**13:24****
    - Scaling dictionary learning technique.
    - Transition from single GPU to multiple GPUs.
    - Sparse auto-encoders: scalability and initial doubts.
    ## Chapter 6: Engineering Efforts and Trade-offs
    ****13:24**-**17:17****
    - Efficiently shuffling large data sets.
    - Balancing short-term experiments and long-term infrastructure.
    - Parallel shuffling for massive data.
    ## Chapter 7: Research Engineering Dynamics
    ****17:17**-**23:43****
    - Differences between product and research engineering.
    - Importance of flexible, iterative development.
    - Strategies for testing and debugging.
    ## Chapter 8: Interdisciplinary Collaboration
    ****23:43**-**32:03****
    - Collaboration enhances outcomes.
    - Importance of diverse skill sets.
    - Pairing different experts together.
    ## Chapter 9: Future of Interpretability
    ****32:03**-**39:29****
    - Vision: Analyze all layers of production models.
    - Goals: Understand feature interactions and model circuits.
    - Scaling techniques to address AI safety challenges.
    ## Chapter 10: Personal Reflections and Team Dynamics
    ****39:29**-**53:12****
    - Personal motivations for working in interpretability.
    - Challenges and satisfactions of the field.
    - Encouragement for new team members to apply.

  • @TomGally
    @TomGally หลายเดือนก่อน +17

    This is a great discussion. Many thanks for posting it.
    I read your “Scaling Monosemanticity” paper soon after it was released and have been telling people how important it is. It’s pretty dense reading, though, and its implications are not yet widely recognized. Nearly every day I still see comments from people dismissing large language models as “just predicting the next word.” I hope Anthropic can produce more videos like this but aimed at a wider audience, so that more people will understand how meaning is represented in LLMs and how their performance can be adjusted for safety and other purposes.

  • @trpultz
    @trpultz หลายเดือนก่อน +23

    I have my issues with Claude, but I appreciate the openness! Look forward to (hopefully) future round tables!

    • @sarahdrawz
      @sarahdrawz หลายเดือนก่อน +6

      like what? just curious

    • @battlepug3122
      @battlepug3122 หลายเดือนก่อน +1

      @@sarahdrawz censorship, bias, political correctness etc.

  • @johnnykidblue
    @johnnykidblue หลายเดือนก่อน +46

    Now maybe people will finally stop saying “they don’t really understand, they’re just predicting the next word.”
    They do understand, and they will take your job.

    • @penguinista
      @penguinista หลายเดือนก่อน +6

      We are going to have to wait a while before that is generally understood still.

    • @therealestmc85
      @therealestmc85 หลายเดือนก่อน +1

      They don't understand anything.

    • @TheRealUsername
      @TheRealUsername หลายเดือนก่อน +1

      It's not understanding, it's memorization

    • @mathiastossens3653
      @mathiastossens3653 หลายเดือนก่อน +7

      @@TheRealUsername Its really a pointless discussion unless we agree on some definition of "understanding" if you look it up you'll see there is no commonly accepted definition. It's just some abstract concept we can use as an ever moving goalpost for these AI's to reach. I would say that if you are able to predict what someone really smart would say it doesnt really matter whether you want to call that understanding or something else.

    • @lyeln
      @lyeln หลายเดือนก่อน +4

      People still saying that these models don't understand anything are just scared and in denial

  • @cheshirecat111
    @cheshirecat111 หลายเดือนก่อน +7

    I just want to say thank you very much to Anthropic and its employees for being more considerate of the risks of AI, compared to OpenAI. Thank you for working on interpretability which has the promise of being able to control models, which will be very important when AGI comes.

  • @kekekekatie
    @kekekekatie หลายเดือนก่อน +3

    More of this please! There's a real hunger out here in reality-land for this stuff.

  • @jimberry7865
    @jimberry7865 หลายเดือนก่อน +2

    Nice to see behind the curtains!

  • @bobbyjunelive1993
    @bobbyjunelive1993 หลายเดือนก่อน +1

    Thank you for not saying ‘right’ after every claim. I would enjoy more bench-engineering discussions like this.
    So we could have our leaders remain in our view…and so happy, I would say, that we have likable, believable and even attractive figureheads. Even in the hardware zone like Nvidia…everyone is stellar. Love it. Now we can have this top level working scientists. Brains we need to hear from to fill in the gargantuan gaps the founders must leave out. Would love more.
    One last level would be worth trying is a group that consists of zero leads. Maybe not only coders but maybe one coder, a marketer, tech support, psych, etc.
    Let’s expand this transparency (not for safety) so we can not only enjoy the exercise of it all but also another smattering of education to let us out here move with you as you unleash it all.
    Thank you.

  • @NandoPr1m3
    @NandoPr1m3 หลายเดือนก่อน +5

    Thank you all for the work you do! Interpretability is a cornerstone for adaptability by the general public. As societal impact of this technology also SCALES, we need both the transparency and the tools to mitigate fear of the unknown. Please keep this type of content coming!

  • @haihuang5879
    @haihuang5879 หลายเดือนก่อน +1

    It's so cool to see big achievements made by newly joined team members!

  • @mattwesney
    @mattwesney หลายเดือนก่อน

    love this. glad to see you guys putting content out like this! a lot of us are rooting for you

  • @TheLegendaryHacker
    @TheLegendaryHacker หลายเดือนก่อน +7

    7:35 Huh, this makes me wonder if you could "rip out" the part of Sonnet that fires for hidden cameras, put that into a smaller model, and get a lightweight SOTA hidden camera detector

    • @ehza
      @ehza หลายเดือนก่อน

      That's interesting!

  • @nossonweissman
    @nossonweissman 26 วันที่ผ่านมา

    40:27 this is so on point and commonly overlooked. Applies in all areas of life!

  • @mustafaozgul5427
    @mustafaozgul5427 หลายเดือนก่อน

    This is amazing how wonderful achievements succeed by new members 👏👏👏👏👏.
    Understanding nanostructures, molecular levels physiology and chemistry in cells, these knowledge and experience allow you to understand whole human organism in macro-level. The researchers here use various versions of these in AI to scale interpretability and understand mechanism answers of AI. They are really interesting as a researcher.

  • @TheFeedRocket
    @TheFeedRocket หลายเดือนก่อน +1

    The difficulty is that these models may come up with the same answer in different ways, just as if you ask 10 humans to come up with an answer or an idea, it would be somewhat different for each person on how they arrived at the answer. Being that the models are not fully matureI, just adding vision changes things a lot, so how they come to answer might continually change. Just when you think you understand a certain aspect of how it arrives at an answer, you could pull out some key features that was assumed was needed and it still comes up with answer.
    What we have built here is truly alien, then again how we think is really alien to us as we have so little understanding of how the human mind works either. I actually think humans are closer to an LLM than we want to believe. Words are very important, and like an LLM sometimes the content doesn't matter, just the act of reading and absorbing more words increases a child's ability in many areas as it does for the LLM. But you can't compare how we think to these models, although similar in some ways, it's very dangerous to compare the two.
    Look forward to more of these, or even a live Q&A.

  • @TheExodusLost
    @TheExodusLost หลายเดือนก่อน +3

    This is a great type of content. Getting ai researchers in a room together talking to other researchers is a new take for most of us.
    I’m curious if they get a bonus or incentive to do this podcast, the woman seems a little nervous!

  • @jerrygonzalezbarrera1435
    @jerrygonzalezbarrera1435 13 วันที่ผ่านมา

    Great talk!

  • @devinhansa2137
    @devinhansa2137 หลายเดือนก่อน +1

    Great!!!

  • @RalphDratman
    @RalphDratman หลายเดือนก่อน

    This is fantastic news -- that you've been able to do this.
    Is it possible for the world to experience some of this directly?

  • @ehza
    @ehza หลายเดือนก่อน +2

    ❤ This is cool!

  • @samyman2006
    @samyman2006 หลายเดือนก่อน +1

    Nice!

  • @ItzGanked
    @ItzGanked หลายเดือนก่อน +1

    more content like this

  • @arnavprakash7991
    @arnavprakash7991 25 วันที่ผ่านมา

    Really hope claude sonnet 3.5 brings more attention to Anthropic, rn the only solid competition to Open AI

  • @imai-pg3cz
    @imai-pg3cz หลายเดือนก่อน +1

  • @fintech1378
    @fintech1378 หลายเดือนก่อน +1

    wheres Ilya

  • @elhorriabdelbasset2550
    @elhorriabdelbasset2550 หลายเดือนก่อน +1

    more like this

  • @FlorentTavernier
    @FlorentTavernier 27 วันที่ผ่านมา

    based

  • @sutthiguy1584
    @sutthiguy1584 หลายเดือนก่อน

    why i thinking about private cloud compute of AI from apple at will coming soon 🤔

  • @shawnfromportland
    @shawnfromportland หลายเดือนก่อน +1

    heady stuff 🍻

  • @jaydwivedi8399
    @jaydwivedi8399 29 วันที่ผ่านมา

    one of them sounds like sam altman

  • @jinsong2231
    @jinsong2231 27 วันที่ผ่านมา +1

    ‪I'm a veteran of llm, developer to be exact, I've used gpt, qwen, kimi, 01, llama, etc, but claude was locked out of my account after I signed up, I've never even experienced it once, you're never going to get past openai and meta, cheapskates‬

  • @420_gunna
    @420_gunna หลายเดือนก่อน

    50:00 superalignment in shambles

  • @nanow1990
    @nanow1990 หลายเดือนก่อน +1

    Multi Modal Image to text Feature is because YOU TRAINED IT WITH CAPTIONS THUS TEXT CAPTION ON IMAGE IS TIED TO TEXT

    • @nanow1990
      @nanow1990 หลายเดือนก่อน +1

      HOW CAN"T THEY UNDERSTAND. IT'S INSANE THAT THEY ARE GETTING PAID FOR THIS

    • @nanow1990
      @nanow1990 หลายเดือนก่อน +1

      when multi-modal models WILL form features with images and text WHEN images were WITHOUT captions then we have something BIG.

    • @nanow1990
      @nanow1990 หลายเดือนก่อน +1

      in other words, models have to learn visuals without captions.

  • @CantataOnslaughta
    @CantataOnslaughta หลายเดือนก่อน +3

    These are the kids you used to pick on in school and now they’re building your robot overlord

    • @victorhugomuzi
      @victorhugomuzi 11 วันที่ผ่านมา

      I don't think these people went to schools where these types are picked on lol

  • @oowaz
    @oowaz หลายเดือนก่อน

    but why would the image of a dog firing at the same time as the word "dog" is mentioned be impressive tho? aren't those images fed to the model with labels and stuff, meaning it had reason to connect the word dog and its image...

    • @lyeln
      @lyeln หลายเดือนก่อน

      I think the impressive thing is not the single embedding for "dog", but that if you present an image of a dog, many features can fire at the same time, such as "protectiveness", "loyalty", "wilderness", "playing games", "veterinary knowledge" etc. Basically concepts related to the object "dog" which demonstrates that the model has a fine conceptual and multidimensional understanding of what a dog is.

    • @oowaz
      @oowaz หลายเดือนก่อน

      @@lyeln yeah but i feel like that happens as a result of the nature of organization, the patterns related to a dog repeat often enough that it reinforces those connections, that part feels intuitive to me... like would there be a more efficient way to learn and organize things based on the it's being fed?

    • @nanow1990
      @nanow1990 หลายเดือนก่อน

      you are right.
      models have to learn visuals without captions
      these people have neither a real knowledge nor foundation to speak about machine learning.

    • @oowaz
      @oowaz หลายเดือนก่อน

      @nanow1990 if that's towards me, i was just asking a question. if that's the case the sheer amount of pattern repetition resulting in those associations isn't necessarily the most impressive thing - what i question is how the researcher can't think of one thing that is more impressive than a model doing something that was designed and fine tuned to do, to make correct predictions it must make correct associations... it would be impressive if it made nonsensical connections in it's entirety and still managed to output coherent answers

    • @nanow1990
      @nanow1990 หลายเดือนก่อน

      @@oowaz i agree. I was talking about people from the video.

  • @dharmaone77
    @dharmaone77 หลายเดือนก่อน +2

    OpenAI office still looks more comfy - oak panelling and more plants

  • @lukalot_
    @lukalot_ หลายเดือนก่อน +1

    why does this video look AI generated to me. I think my AI detection has went haywire. or just the mouths are out of sync and the skin is softened.

  • @grantemerson5932
    @grantemerson5932 หลายเดือนก่อน

    Claude is vegan

    • @Fyrelangs
      @Fyrelangs หลายเดือนก่อน

      Based

  • @wi2rd
    @wi2rd หลายเดือนก่อน +2

    This whole ordeal reminds me of doom.
    bunch of mad scientists spend immense amounts of time and money to try and open a gate to another dimension, only to find it was a gate a to hell and a never-ending stream of demons take over the planet.

    • @keepmehomeplease
      @keepmehomeplease หลายเดือนก่อน

      Except that is a fairytale and this is REAL LIFE. You are a prime example of “fear what you don’t understand”. Compared to the rest of the world, these people are rather sane, sharing an intimate passion in advancing human knowledge.
      I fear that the real doom is in the silence and ignorance of masses like you.
      Scared? Help us understand it. Angry? Voice your concerns. Tired? So are we.
      Hell is very real, and we are living in it. Who you should FEAR is the man in the mirror remaining absent in the unfolding events of human extinction. Bring food to the table if you want to be fed.

  • @mikezooper
    @mikezooper หลายเดือนก่อน

    This video isn’t useful to me. I’ll have to do research before it makes sense.

  • @jinsong2231
    @jinsong2231 27 วันที่ผ่านมา

    ‪I'm a veteran of llm, developer to be exact, I've used gpt, qwen, kimi, 01, llama, etc, but claude was locked out of my account after I signed up, I've never even experienced it once, you're never going to get past openai and meta, cheapskates‬