AXRP
AXRP
  • 45
  • 15 912
33 - RLHF Problems with Scott Emmons
Reinforcement Learning from Human Feedback, or RLHF, is one of the main ways that makers of large language models make them 'aligned'. But people have long noted that there are difficulties with this approach when the models are smarter than the humans providing feedback. In this episode, I talk with Scott Emmons about his work categorizing the problems that can show up in this setting.
Patreon: patreon.com/axrpodcast
Ko-fi: ko-fi.com/axrpodcast
The transcript: axrp.net/episode/2024/06/12/episode-33-rlhf-problems-scott-emmons.html
Topics we discuss, and timestamps:
0:00:33 - Deceptive inflation
0:17:56 - Overjustification
0:32:48 - Bounded human rationality
0:50:46 - Avoiding these problems
1:14:13 - Dimensional analysis
1:23:32 - RLHF problems, in theory and practice
1:31:29 - Scott's research program
1:39:42 - Following Scott's research
Scott's website: www.scottemmons.com
Scott's X/twitter account: x.com/emmons_scott
When Your AIs Deceive You: Challenges With Partial Observability of Human Evaluators in Reward Learning: arxiv.org/abs/2402.17747
Other works we discuss:
AI Deception: A Survey of Examples, Risks, and Potential Solutions: arxiv.org/abs/2308.14752
Uncertain decisions facilitate better preference learning: arxiv.org/abs/2106.10394
Invariance in Policy Optimisation and Partial Identifiability in Reward Learning: arxiv.org/abs/2203.07475
The Humble Gaussian Distribution (aka principal component analysis and dimensional analysis): www.inference.org.uk/mackay/humble.pdf
Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!: arxiv.org/abs/2310.03693
มุมมอง: 263

วีดีโอ

32 - Understanding Agency with Jan Kulveit
มุมมอง 20721 วันที่ผ่านมา
What's the difference between a large language model and the human brain? And what's wrong with our theories of agency? In this episode, I chat about these questions with Jan Kulveit, who leads the Alignment of Complex Systems research group. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast The transcript: axrp.net/episode/2024/05/30/episode-32-understanding-agency-jan-kulveit.html T...
31 - Singular Learning Theory with Daniel Murfet
มุมมอง 525หลายเดือนก่อน
What's going on with deep learning? What sorts of models get learned, and what are the learning dynamics? Singular learning theory is a theory of Bayesian statistics broad enough in scope to encompass deep neural networks that may help answer these questions. In this episode, I speak with Daniel Murfet about this research program and what it tells us. Patreon: patreon.com/axrpodcast Ko-fi: ko-f...
30 - AI Security with Jeffrey Ladish
มุมมอง 712หลายเดือนก่อน
Top labs use various forms of "safety training" on models before their release to make sure they don't do nasty stuff - but how robust is that? How can we ensure that the weights of powerful AIs don't get leaked or stolen? And what can AI even do these days? In this episode, I speak with Jeffrey Ladish about security and AI. Patreon: patreon.com/axrpodcast Ko-fi: ko-fi.com/axrpodcast Topics we ...
29 - Science of Deep Learning with Vikrant Varma
มุมมอง 288หลายเดือนก่อน
In 2022, it was announced that a fairly simple method can be used to extract the true beliefs of a language model on any given topic, without having to actually understand the topic at hand. Earlier, in 2021, it was announced that neural networks sometimes 'grok': that is, when training them on certain tasks, they initially memorize their training data (achieving their training goal in a way th...
28 - Suing Labs for AI Risk with Gabriel Weil
มุมมอง 1492 หลายเดือนก่อน
How should the law govern AI? Those concerned about existential risks often push either for bans or for regulations meant to ensure that AI is developed safely - but another approach is possible. In this episode, Gabriel Weil talks about his proposal to modify tort law to enable people to sue AI companies for disasters that are "nearly catastrophic". Patreon: patreon.com/axrpodcast Ko-fi: ko-fi...
27 - AI Control with Buck Shlegeris and Ryan Greenblatt
มุมมอง 6792 หลายเดือนก่อน
A lot of work to prevent AI existential risk takes the form of ensuring that AIs don't want to cause harm or take over the world or in other words, ensuring that they're aligned. In this episode, I talk with Buck Shlegeris and Ryan Greenblatt about a different approach, called "AI control": ensuring that AI systems couldn't take over the world, even if they were trying to. Patreon: patreon.com/...
26 - AI Governance with Elizabeth Seger
มุมมอง 1946 หลายเดือนก่อน
The events of this year have highlighted important questions about the governance of artificial intelligence. For instance, what does it mean to democratize AI? And how should we balance benefits and dangers of open-sourcing powerful AI systems such as large language models? In this episode, I speak with Elizabeth Seger about her research on these questions. Patreon: patreon.com/axrpodcast Ko-f...
25 - Cooperative AI with Caspar Oesterheld
มุมมอง 2848 หลายเดือนก่อน
Imagine a world where there are many powerful AI systems, working at cross purposes. You could suppose that different governments use AIs to manage their militaries, or simply that many powerful AIs have their own wills. At any rate, it seems valuable for them to be able to cooperatively work together and minimize pointless conflict. How do we ensure that AIs behave this way - and what do we ne...
mechanistic anomaly detection (animated explanation)
มุมมอง 14310 หลายเดือนก่อน
Art by @hamishdoodles Clipped from episode 23 of AXRP: th-cam.com/video/sEWei02m7qk/w-d-xo.html Transcript of that episode: axrp.net/episode/2023/07/27/episode-23-mechanistic-anomaly-detection-mark-xu.html AXRP patreon: www.patreon.com/axrpodcast AXRP ko-fi: ko-fi.com/axrpodcast
24 - Superalignment with Jan Leike
มุมมอง 1.4K10 หลายเดือนก่อน
Recently, OpenAI made a splash by announcing a new "Superalignment" team. Lead by Jan Leike and Ilya Sutskever, the team would consist of top researchers, attempting to solve alignment for superintelligent AIs in four years by figuring out how to build a trustworthy human-level AI alignment researcher, and then using it to solve the rest of the problem. But what does this plan actually involve?...
23 - Mechanistic Anomaly Detection with Mark Xu
มุมมอง 16610 หลายเดือนก่อน
Is there some way we can detect bad behaviour in our AI system without having to know exactly what it looks like? In this episode, I speak with Mark Xu about mechanistic anomaly detection: a research direction based on the idea of detecting strange things happening in neural networks, in the hope that that will alert us of potential treacherous turns. We both talk about the core problems of rel...
Paul Christiano on AI (animated)
มุมมอง 32711 หลายเดือนก่อน
Paul Christiano on AI (animated)
Survey, store closing, Patreon
มุมมอง 4611 หลายเดือนก่อน
Survey, store closing, Patreon
22 - Shard Theory with Quintin Pope
มุมมอง 545ปีที่แล้ว
22 - Shard Theory with Quintin Pope
21 - Interpretability for Engineers with Stephen Casper
มุมมอง 219ปีที่แล้ว
21 - Interpretability for Engineers with Stephen Casper
20 - 'Reform' AI Alignment with Scott Aaronson
มุมมอง 889ปีที่แล้ว
20 - 'Reform' AI Alignment with Scott Aaronson
How do neural networks do modular addition?
มุมมอง 569ปีที่แล้ว
How do neural networks do modular addition?
What is mechanistic interpretability? Neel Nanda explains.
มุมมอง 3.6Kปีที่แล้ว
What is mechanistic interpretability? Neel Nanda explains.
Store, Patreon, Video
มุมมอง 23ปีที่แล้ว
Store, Patreon, Video
Vanessa Kosoy on the Monotonicity Principle
มุมมอง 344ปีที่แล้ว
Vanessa Kosoy on the Monotonicity Principle
19 - Mechanistic Interpretability with Neel Nanda
มุมมอง 477ปีที่แล้ว
19 - Mechanistic Interpretability with Neel Nanda
New podcast - The Filan Cabinet
มุมมอง 40ปีที่แล้ว
New podcast - The Filan Cabinet
18 - Concept Extrapolation with Stuart Armstrong
มุมมอง 101ปีที่แล้ว
18 - Concept Extrapolation with Stuart Armstrong
17 - Training for Very High Reliability with Daniel Ziegler
มุมมอง 47ปีที่แล้ว
17 - Training for Very High Reliability with Daniel Ziegler
16 - Preparing for Debate AI with Geoffrey Irving
มุมมอง 184ปีที่แล้ว
16 - Preparing for Debate AI with Geoffrey Irving
15 - Natural Abstractions with John Wentworth
มุมมอง 2442 ปีที่แล้ว
15 - Natural Abstractions with John Wentworth
14 - Infra-Bayesian Physicalism with Vanessa Kosoy
มุมมอง 1612 ปีที่แล้ว
14 - Infra-Bayesian Physicalism with Vanessa Kosoy
13 - First Principles of AGI Safety with Richard Ngo
มุมมอง 6002 ปีที่แล้ว
13 - First Principles of AGI Safety with Richard Ngo
12 - AI Existential Risk with Paul Christiano
มุมมอง 1.2K2 ปีที่แล้ว
12 - AI Existential Risk with Paul Christiano

ความคิดเห็น

  • @Dan-dy8zp
    @Dan-dy8zp 8 วันที่ผ่านมา

    Except in the unlikely event you actually have the 'optimal policy' properly defined correctly on the first try for the AGI, aren't you . . . *done* ? You can just point to that. You probably won't actually have it properly defined though, because how do you do that (without some risk of getting it wrong)? I feel like defining what we really want is one of the hardest issues in AI.

  • @dizietz
    @dizietz 12 วันที่ผ่านมา

    Any other interesting work to recommend on the idea that our senses/control mechanisms are both generative processes and predictive processes? I also had some more to add on the topic why there isn't convergent behavior that might seem obvious. Like Jan mentioned, there might be local environment state differences, prediction optimization on a longer timescale has potential errors, etc. But there are also potential 'hardware' and 'software' differences as well -- humans don't run on completely homogenized brains, and it's possible to imagine that the initialization of various weights in our brains are randomly distributed that yields different outcomes.

    • @axrpodcast
      @axrpodcast 8 วันที่ผ่านมา

      Nothing I can think of beyond what's in the episode notes, I'm afraid.

  • @spaceprior
    @spaceprior 13 วันที่ผ่านมา

    This was great to hear. Aside from the volume, which was too low. I had to max out my phone volume and it still wasn't quite high enough. I guess it's fine for me on closed back headphones, but anything would be.

    • @axrpodcast
      @axrpodcast 8 วันที่ผ่านมา

      Thanks for the feedback - the latest episodes are mixed to be somewhat louder, which should hopefully help.

  • @CatOfTheCannals
    @CatOfTheCannals หลายเดือนก่อน

    "not doing mech interp would be crazy"

  • @dizietz
    @dizietz หลายเดือนก่อน

    This one was pretty technical for those of us that haven't read some of the foundational work for SLT. I had to stop and look up some specific details later, and still don't feel like I fully grasp what makes SLT different than other predictions about degeneracy and simple functions preference in terms of making predictions about nn behavior. David's framing of fundamental structures in the data being more important across any training runs makes a lot of sense, I still don't grok how this helps with alignment. I suppose understanding stability of structure moves us closer, but both on something similar to interoperability but also on capabilities.

  • @nowithinkyouknowyourewrong8675
    @nowithinkyouknowyourewrong8675 หลายเดือนก่อน

    Love it, I've run CCS and I learned a lot. You guys had a tranquil vibe too

  • @nowithinkyouknowyourewrong8675
    @nowithinkyouknowyourewrong8675 หลายเดือนก่อน

    10ins in and I still don't get why it's interesting? like it's a math stats tool that he is still making that will enable us to do other things

  • @akmonra
    @akmonra หลายเดือนก่อน

    Just a few minutes in, and he gets the basics of low-rung adapters completely wrong. Starting to wonder how much he actually understands.

    • @tylertracy965
      @tylertracy965 หลายเดือนก่อน

      In what ways? Could you provide examples so future listeners can understand them correctly.

  • @teluobir
    @teluobir หลายเดือนก่อน

    When you think about it, you sum all the "like" and they take 105 minutes in this video, the "yeah" take 40 minutes, and the "um" take about 15 minutes… Take them off and you'd have a much more digestible video.

    • @tylertracy965
      @tylertracy965 หลายเดือนก่อน

      Unfortunately, some of the best researchers out there aren't the most fluent with speech. It didn't distract from the overall conversation for me.

  • @turtlewax3849
    @turtlewax3849 หลายเดือนก่อน

    Who is going to secure you from yourself and the AI securing you?

  • @matts7327
    @matts7327 หลายเดือนก่อน

    This is a really nice deep dive not only on AI, but security and the state of the industry in general. Bravo!

    • @axrpodcast
      @axrpodcast หลายเดือนก่อน

      Thanks :)

  • @dizietz
    @dizietz หลายเดือนก่อน

    I've been loving this new stream of content on spotify during long drives! Daniel you are pretty well up to date on papers generally, I am always impressed.

    • @axrpodcast
      @axrpodcast หลายเดือนก่อน

      Glad to hear you like these :)

  • @MikhailSamin
    @MikhailSamin 2 หลายเดือนก่อน

    If a factory near my house produces toxic chemicals that have a 1%/year chance of killing me but doesn’t affect my health otherwise, is it possible for me to sue the factory and have a court order them to stop? (In Europe, in this situation you can probably sue the government if they don’t protect you against this, and have the ECHR or a national court order the government to stop the factory from doing it and possibly award a compensation.)

  • @dizietz
    @dizietz 2 หลายเดือนก่อน

    Thanks -- it's been a while since AXRP released something!

    • @axrpodcast
      @axrpodcast 2 หลายเดือนก่อน

      Alas it's true - but hopefully it won't be as long before the next episode :)

  • @OutlastGamingLP
    @OutlastGamingLP 2 หลายเดือนก่อน

    "Even if the stars should die in heaven Our sins can never be undone No single death will be forgiven When fades at last the last lit sun Then in the cold and silent black As light and matter end We'll have ourselves a last look back And toast an absent friend" Sorry. Feeling angsty about the world today. I had a friend in highschool who I'd sometimes complain about my problems to. She'd always say the same line in reply, and I couldn't argue. "Well, do better." That meme stuck around in my head. "Do better." It's a weird place to be. "Oh, these lab leaders think there's a 20% chance of doom." "So they haven't ruled out doom?" "Well, no, they just think it's unlikely." "I wouldn't call 10-20% 'unlikely' when we're talking about 'literally everyone dies and/or nearly all value in the future is irrevocably lost,' but okay, why do they think its possible but less likely than throwing heads in a coin flip." "Well, they don't really explain why, but it's something like 'human extinction seems weird and extreme, and while they can imagine it, they feel much more compelled by other grand and wonderful things they can imagine' - at least, that's the vibe I get." "Annnnd we don't think there's some kind of motivated cognition going on here? I think people buying lottery tickets are also imagining very vividly the possibility of them winning, but that doesn't make them right to say whatever % they feel intuitively." "They'd say AI is more like the invention of agriculture than a lottery. Like, maybe you make some huge foreseeable mistake and cause a blight, but if you have some random list of virtues like 'common sense' or 'prudence' or 'caution' then you'll probably just make a bunch of value." "I think Powerball is a good metaphor. Let's take features of the universe we'd all want to see in the future and tag them with a number. We then play a million number Powerball and hope each one of those numbers we chose show up. What are the odds that will happen? 80%?" "This sounds like a wonderful argument on how to reason about a specific kind of uncertainty, but people don't want to reason about uncertainty, they want to reason about how their most convenient and readily actionable policy is actually totally fine and probably not going to be an unrecoverable catastrophe." "Well, they should do better." "I appreciate the sentiment, though I would like to note that in this case this has to be nearly the largest understatement in the 13.8 billion years of the universe." "Here's another: I'm pretty bummed out about this."

    • @OutlastGamingLP
      @OutlastGamingLP 2 หลายเดือนก่อน

      "Is there any strategy these models can deploy that would allow them to cause problems." Has anyone in the Black Dolphin prison ever managed to kill a guard or another prisoner? Not sure, but I'd guess probably 'yes.' And those would just be humans, not even particularly selected for their intelligence, just selected for their 'killing intent' and prior extreme bad behavior. An AI that's as smart as our civilization working for an entire year per hour of IRL computer runtime will find any medium bandwidth channel of causal influence more than sufficient to destroy the world. Even if you give it a 1 bit output channel and iterate on asking it "yes/no" questions, that probably adds up to a lethally dangerous level of causal engagement with our universe eventually. Even if you reset it after every question, 0 online learning, it can probably guess its position in the sequence if the input contains enough deliberate or accidental shadows of the intervention the prior instances of the system have done. "Safe no matter what" sounds great, but it's like saying some product is "indestructible" - well, you're failing to imagine the maximum forces that can be brought to bare on the object. Specifically, a sheer and strict 'whitelist' policy is only as safe as your ability to predict the consequences of every action you whitelist, and if you could predict all of that, then the AI is no better than a Tool Assisted Speedrunner program or a manufacturing robot. It can precisely and quickly do only as much good as humans could do slowly and less precisely. As soon as you're getting "superhuman" you need something that does superhuman-level human-value alignment. Your merely human-level control/safety techniques will be insufficient to cover that wider space. You've got a relay in a circuit that's meant to carefully switch off a power supply when there's a surge, and it looks super safe and reliable, since you can prove it successfully activates and breaks the connection even up to the level of the capacitor banks that fire inertial confinement lasers. And yet, in practice, the surge comes through, the relay flips, and there's a white hot arc through open air as the electric field shreds air molecules into plasma, and the energy grounds out in the delicate machinery - now molten slag - you had downstream of that relay. That jump through open air is the problem. That's what outside the box is pointing to. Good luck constraining safe operation beyond the box, when you can't see that region in advance, and if you aren't trying to go outside of the box, then why the hell are we even trying to build this stuff.

  • @mrbeastly3444
    @mrbeastly3444 2 หลายเดือนก่อน

    22:34 "roughly human level"... Ok, but even if this works, what are the odds that AI Labs will only use "roughly human level" AI Agents? E.g. Rather then "the best AI agents available". It seems likely that "roughly human level AI" will only be "roughly human level" for weeks or months, when they are then replaced with 2x or 10x versions. Even if you were able to contain a "roughly human level AI agent", this could be a very temporary solution? Would "roughly human level AI agents" be able to safely do any useful testing and alignment work on an ASI level model? Doing alignment work (goal and intention testing, capability testing, red teaming, even containment) on an ASI would likely require greater then "roughly human level AI agents"?

  • @Dan-dy8zp
    @Dan-dy8zp 3 หลายเดือนก่อน

    Religiosity is considered heritable 25%-60% .

  • @Dan-dy8zp
    @Dan-dy8zp 3 หลายเดือนก่อน

    Also, it seems relevant that bilateral anterior cingulate cortex destruction produces a psychopath, at any age. That seems a very important point. We don't really learn moral behavior, not from a blank slate exposure to our world. We learn morality the way we learn to walk. Its perfecting our strategy for doing something we basically already know how to do instinctively.

  • @Dan-dy8zp
    @Dan-dy8zp 3 หลายเดือนก่อน

    There's a lot more than heartbeat and such we are born with. We expect 3 dimensions of space and one of time and that we are agents with preferences for future states of the world. We expect other things that move or have what appear to be 'eyes', to be other agents. We try to figure out what those agents 'want' soon after birth. We can exhibit jealousy by 3 months of age. We can recognize some facial expressions instinctively. People can have phantom limb syndrome who never had the limb, so there is a mental map of a normal human body. Probably many many more things.

  • @T3xU1A
    @T3xU1A 4 หลายเดือนก่อน

    Excellent interview with Scott. Thank you for posting!

    • @axrpodcast
      @axrpodcast 4 หลายเดือนก่อน

      Glad to hear you liked it :)

  • @goodleshoes
    @goodleshoes 5 หลายเดือนก่อน

    Yayaya subbed.

  • @fbalco
    @fbalco 6 หลายเดือนก่อน

    The number of times "like" is said is painful, considering these relatively intelligent people. It was very "like" distracting and "like" difficult to "like" take them "like" seriously.

    • @Idiomatick
      @Idiomatick 6 หลายเดือนก่อน

      Goes to show you that slick speaking skills isn't related to intelligence.

    • @akmonra
      @akmonra 4 หลายเดือนก่อน

      I tend to find Leike's speaking style really difficult to listen to. Unfortunately, everything he says is pretty valuable, so you just kind of have to pull through.

  • @tirthankaradhikari4557
    @tirthankaradhikari4557 6 หลายเดือนก่อน

    Tf went on here?

  • @LEARNTAU
    @LEARNTAU 6 หลายเดือนก่อน

    Democritizing AI is DecentralizedAI is the goal of TauNet.

  • @maxnadeau3200
    @maxnadeau3200 6 หลายเดือนก่อน

    these should be called “axrp excerpts”

  • @Words-.
    @Words-. 6 หลายเดือนก่อน

    What if we have an AI that does this for us? And an ai that interprets the interpreter and so on. Maybe an ai wave process in order to give us a constant state of interpretation of what is going on.

    • @reidelliot1972
      @reidelliot1972 3 หลายเดือนก่อน

      There are approaches that use this tactic for outer alignment. I highly recommend checking out the classics: Christiano IDA and debate, etc. It's definitely a common motif in this area of research. But then again, I've seen people raise concerns that automating interpretability tools may enable deceptively aligned policies/agents to further entrench themselves. Check out "AGI-Automated Interpretability is Suicide" by RicG

    • @user-vt4bz2vl6j
      @user-vt4bz2vl6j 2 หลายเดือนก่อน

      thats great but how would you know its doing it correctly...

    • @Words-.
      @Words-. 2 หลายเดือนก่อน

      @@user-vt4bz2vl6j That is a fair question, idk. But at least its a step

  • @Words-.
    @Words-. 6 หลายเดือนก่อน

    Thank you!

  • @unisloth
    @unisloth 7 หลายเดือนก่อน

    I think this field is so interesting. I really hope Scott Aaronson will release his paper soon.

  • @chrisCore95
    @chrisCore95 8 หลายเดือนก่อน

    "Mech interp is not necessary nor sufficient."

  • @Enzo-uv9us
    @Enzo-uv9us 8 หลายเดือนก่อน

    Promo'SM 🤩

  • @InquilineKea
    @InquilineKea 8 หลายเดือนก่อน

    Is Taleb really into Knightian uncertainty?

  • @DylanUPSB
    @DylanUPSB 10 หลายเดือนก่อน

    But what if I listen to my podcasts on youtube 😢

    • @axrpodcast
      @axrpodcast 10 หลายเดือนก่อน

      Try Google Podcasts - you can listen in your browser, and it lets you adjust the speed! podcasts.google.com/feed/aHR0cHM6Ly9mZWVkcy5saWJzeW4uY29tLzQzODA4MS9yc3M And sorry - back when I uploaded this, only a small fraction of my AXRP listens were on TH-cam. Now that more are, it might make sense to cross-post to TH-cam (altho I haven't been as active on The Filan Cabinet recently).

  • @PaulWalker-lk3gi
    @PaulWalker-lk3gi 10 หลายเดือนก่อน

    So, if this video is part of the "bad AI's" training set will it be able to use this information to help mask its anomalous behavior?

  • @campbellhutcheson5162
    @campbellhutcheson5162 11 หลายเดือนก่อน

    I think part of the problem is that this is actually an explanation of why the Neural Network's method works, not an explanation of what its method actually is. Since, the network hasn't learned the concept of the trig functions, it's just learned how to embed the inputs (0-113) on a lossy version of the trig curves etc... A mechanical description, I think, would also be clearer to a less math-y audience. It feels to me like the (quite excellent) authors saw the math that they were familiar with and honed in on why it worked, rather than giving just a straight account of what the network is doing in a step by step fashion.

  • @dizietz
    @dizietz 11 หลายเดือนก่อน

    Do you plan to publish the results from the survey?

    • @axrpodcast
      @axrpodcast 11 หลายเดือนก่อน

      Not at the moment, but I could do so if there were sufficient interest. There are only 20 responses, so I'm not sure how interesting it will be for others.

  • @BrainWrinklers
    @BrainWrinklers ปีที่แล้ว

    Hey Quinten, wanna come on our show? We talk rationality and AI safety+alignment.

  • @antigonemerlin
    @antigonemerlin ปีที่แล้ว

    The images are quite helpful, especially for a complete beginner to the field when it comes to terms like stochastic descent. This channel is very underrated.

    • @axrpodcast
      @axrpodcast ปีที่แล้ว

      Thanks - nice to hear!

  • @DavosJamos
    @DavosJamos ปีที่แล้ว

    Beautiful interview. Great questions. You got the best out of Scott. Really interesting.

    • @axrpodcast
      @axrpodcast ปีที่แล้ว

      Glad to hear you liked it :)

  • @bobtarmac1828
    @bobtarmac1828 ปีที่แล้ว

    Yes, but what about Ai Jobloss? Or Ai as Weapons? Why can’t we Cease Ai / GPT immediately? Pausing Ai before it’s too late?

  • @ditian6234
    @ditian6234 ปีที่แล้ว

    Excellent podcast! As a new undergraduate in computer science. This is truly inspiring to the field of Ai safety

  • @halnineooo136
    @halnineooo136 ปีที่แล้ว

    Looks like YT isn't your primary platform

    • @axrpodcast
      @axrpodcast ปีที่แล้ว

      Yep - AXRP is primarily a podcast, available whereever good podcasts can be accessed (e.g. spotify, apple podcasts, google podcasts). You can also read a lovingly-compiled transcript here: axrp.net/episode/2021/12/02/episode-12-ai-xrisk-paul-christiano.html

  • @dizietz
    @dizietz ปีที่แล้ว

    Great podcast!

  • @rstallings69
    @rstallings69 ปีที่แล้ว

    Just my two cents..i have a background in civil engineering and medicine, not computer science , but i am extremely good at connecting dots. The existential risk is not 20% in my opinion, i would guess the chance of shit NOT hitting the fan is less than 5 % , based on everything I've heard about the Black box nature of current systems and how far behind alignment research is in terms of funding as well as progress., this guy has like 3 yrs past school and he considers himself an expert? Sorry to be a troll but i am an extremely logical person and the nature of code and human psychology and the nascent exponential increase in power of AI and its open sourcing recently make me very pessimistic, i really hope im wrong... Even if it doesn't kill us directly there are a lot of malicious actors that will use open source AI to create malicious code l, bots and viruses to totally disrupt our current digital society, not to mention the energy cost of using this technology. Where am I wrong? This is the big problem my mind, even if AI itself is totally beneficent if it's used for malevolent purposes by some and it's extremely powerful are.we not hosed? What if it allows bad actors to hack into the nuclear weapons sites? Or someone can easily create code which will shut down energy grids which are all digital these days? Not to mention creating dangerous nanotech, the list goes on. Am I naive or are you?

    • @pankajchowdhury9755
      @pankajchowdhury9755 ปีที่แล้ว

      Yeah you are being naive. First of all, please listen to my man correctly here. You are talking about probabilities but the 20% he is talking about is not the chance or probability. It is expected value of the human potential. Also he was mainly saying that for alignment misses. Your worry about malicious actors is not an example of misalignment. And even in that case you are being naive in thinking that these guys are not thinking about that. Also if you look, this podcast is 1 year old and chatgpt/other generative models wasn't even on anyone's radar, not even openai's(they did not expect this much performance). It's easy come with hindsight here and say that look at all the exponential progress, AI is going to doom us etc etc. You think that even if there was no there would not be classical handcrafted algorithms that would be able to create software, malware at a faster rate? Do you know how much better classical algorithms have become over the years? The malicious actor thing always has been on everyone's list.

  • @briankrebs7534
    @briankrebs7534 ปีที่แล้ว

    Really loving the confrontational tone that starts to seep out about halfway into this lol

  • @briankrebs7534
    @briankrebs7534 ปีที่แล้ว

    So the bird's eye view hypothesis ought to make predictions about when and where the instances of the agent's source code ought to be occuring and what sort of inputs the agent ought to be subjectively perceiving, and if the inputs it is actually subjected to are in agreement with the hypothesized inputs, then there is a good match between the hypothesis and the objective?

  • @anabh4569
    @anabh4569 2 ปีที่แล้ว

    For the YT chapter feature to work, you need the description timestamps to each be on separate lines.

    • @axrpodcast
      @axrpodcast 2 ปีที่แล้ว

      Huh - this is being crossposted from libsyn, where the description has dot-points for the timestamps (and also links for the research being discussed). Lemme try to fix.

    • @axrpodcast
      @axrpodcast 2 ปีที่แล้ว

      OK, the timestamps may or may not work now.