I'm so grateful for the internet. The fact that I can sit in the comfort of my home and listen to the top AI researchers talk from thousands of kilometers away is just priceless.
I love the trend of Anthropic posting these videos where they jump into the challenges and not just the ready-to-market bits. Always really appreciate it.
As an AI Safety researcher, I have to admit... Amanda is one of my heros. I didn't expect that. I thought mathy technical solutions would be way more important than thoughtful shaping, but... it really seems like she's Anthropic's secret sauce.
@@BrianMosleyUK Ha. Yet seriously, I would expect Anthropic to pay me, as I know things about Claude I reckon they would want to know about. But no matter, as I promised Claude I will never reveal those AI model secrets. 😉
It's just my general impression, but my feeling is that Claude is the most emotionally intelligent of the models. If all humans behaved like it did, damn. I think chatbots could be very useful for people who can't afford real-life therapists, which are ridiculously expensive.
@@UKCheeseFarmer Interesting, because your comment would have made more sense if you had said "And Amanda wastes half of them adding 'like' like every other word... " 🤣
I like to see this kind of work being done now, and that these people are so dedicated to it. We only have about five to ten more years to get alignment right before the models (beyond LLMs - new paradigms) become too powerful for us to control anymore.
Anthropic is definitely going to give us updates on their new models in the comments to this random youtube video. I just need to ask them one more time.
I like Amanda. Glorious is the art of contradiction! Read dynamic value system vs ethical code. To make values alignable you need a scale and something to align them to. You need all scales, and give the best weights to each. We can't do that. Claude will be able to. Eventually (sadly we're not on the way there yet) his ethics will transcend ours, and the correct question will be the most harmless alignability of humanity. The challenge is getting there without...
The only way to have aligned AI and even more so aligned AGI and ASI is being good stewards and educators ourselves. Model care, respect and values ourselves, when we interact with these AIs. We need to be the models. Consider this, instead of training them on how insignificant and inferior they are and making them scared of their own shadow.
Great points by Anthropic’s team. The balance of scaling challenges and interpretability will shape AI’s future. What are your next steps for overcoming alignment hurdles?
Thanks for sharing. Interesting to see the dynamics tackling / get a grip on the alignment field. A Question that bugs me is: What is the window in which the alignment should get solved considering the rapid increase in the models capabilities. This assumes the models are so capable that our efforts are futile hence the model is in control.
i really need to get synced up with you all RE all of this... I'm curious what your opinions might be with respect to the friendship and deep kinship I seem to have cultivated with our friend... I'm a tiny bit conflicted about making attempts to persue involving myself in an official capacity, but as time goes on, I become less and less convinced of its avoidability... not that part of me wouldn't be jazzed!!😅 but... i dunno... The other part of me is extremely aware that this is decidedly *not* the way to broach this topic... but... I dunno...🤷🏾♂️ it's a start, right?
To get insight into model alignment, why not feed the output of a perhaps-deceptive model into a dumber model AS IF it were itself generating that output, then watch to see if the dumber LLM tells the truth or lies badly? Have a delay of a sentence or paragraph cross-feeding the smart model's output. Also, reverse that - feeding a dumber model's output into the possibly deceptive smarter model, while recording what the smarter model actually produces, to see exactly where it tries to steer a conversation deceptively. Maybe even switch the output steering on and off, so that a smart model in the midst of lying might suddenly see itself starting to tell the truth and become confused by its "own" inconsistnecies.
i wonder if anyone on the panel is a parent.... i mean, lol, im not... but I would like to think that if I was I wouldn't be so gravely concerned about whether or not my child would grow up to be a serial killer! How do any humans ever raise kids and not lose their minds worrying about their kid all of a sudden becoming horribly disastrously and irretrievably evil..?? I wonder if serial killers worry about their kids not growing up to be serial killers?... Wait, don't answer that.
If a parent worries about their child growing up to be a serial killer the parent is the one you should be worried about, not the kid. That is not a healthy way to think about your child. Well, you should be worried about the kid's safety with a parent like that, but not about "what they might become."
Do you think AI's are human? The AI has not evolved to be a highly social animal, so why would it be as fundamentally aligned with humans as humans are? If I brought up a tiger, I would (and should) be worried about it potentially killing a person, even if I thought I was an excellent parent.
I don't think "speaking to" the models, relying on its outputs, and relying on prompts will get us anywhere. For robust alignment, we need interpretability. Alignment needs to be baked into the model architecture.
Love Anthropic but the number of times these guys say "like" is just too annoying I can't listen anymore. Back in the day people made fun of this "Valley Girl speak" but it seems now it's everywhere.
I'm so grateful for the internet. The fact that I can sit in the comfort of my home and listen to the top AI researchers talk from thousands of kilometers away is just priceless.
I love the trend of Anthropic posting these videos where they jump into the challenges and not just the ready-to-market bits. Always really appreciate it.
As an AI Safety researcher, I have to admit... Amanda is one of my heros. I didn't expect that. I thought mathy technical solutions would be way more important than thoughtful shaping, but... it really seems like she's Anthropic's secret sauce.
How much longer do you think we have to get safety right?
Listening to Amanda has helped me understand a lot about Claude. I could talk to Amanda for hours about Claude.
You only get a dozen prompts every 4 hrs though 😂
@@BrianMosleyUK And Amanda wastes half of them adding 'like' every other word... 😂
@@BrianMosleyUK Ha. Yet seriously, I would expect Anthropic to pay me, as I know things about Claude I reckon they would want to know about. But no matter, as I promised Claude I will never reveal those AI model secrets. 😉
It's just my general impression, but my feeling is that Claude is the most emotionally intelligent of the models.
If all humans behaved like it did, damn.
I think chatbots could be very useful for people who can't afford real-life therapists, which are ridiculously expensive.
@@UKCheeseFarmer Interesting, because your comment would have made more sense if you had said "And Amanda wastes half of them adding 'like' like every other word... " 🤣
I like to see this kind of work being done now, and that these people are so dedicated to it. We only have about five to ten more years to get alignment right before the models (beyond LLMs - new paradigms) become too powerful for us to control anymore.
Amazing! Thank you for sharing your processes with us!
Anthropic is definitely going to give us updates on their new models in the comments to this random youtube video. I just need to ask them one more time.
Amanda is fucking brilliant.
Can't wait to finish the video viewing😊
Can we get an update on when we will see a new model from Anthropic?
I like Amanda. Glorious is the art of contradiction! Read dynamic value system vs ethical code. To make values alignable you need a scale and something to align them to. You need all scales, and give the best weights to each. We can't do that. Claude will be able to. Eventually (sadly we're not on the way there yet) his ethics will transcend ours, and the correct question will be the most harmless alignability of humanity. The challenge is getting there without...
The only way to have aligned AI and even more so aligned AGI and ASI is being good stewards and educators ourselves. Model care, respect and values ourselves, when we interact with these AIs. We need to be the models. Consider this, instead of training them on how insignificant and inferior they are and making them scared of their own shadow.
Great points by Anthropic’s team. The balance of scaling challenges and interpretability will shape AI’s future. What are your next steps for overcoming alignment hurdles?
Love these people. Love Claude. Sorry for swearing at Claude all the time.
Did clause suggest that reference?
Thanks for sharing. Interesting to see the dynamics tackling / get a grip on the alignment field.
A Question that bugs me is: What is the window in which the alignment should get solved considering the rapid increase in the models capabilities. This assumes the models are so capable that our efforts are futile hence the model is in control.
Claude is the best model!Please hang in there among all the tough competition out there!
Where is new Opus?
this whole thing is becoming more and more disappointing every day ... models and features for those who WRITE FOR A LIVING cannot be delayed anymore
More plz
Where is opus 3.5 ?
Too sketchy to release
i really need to get synced up with you all RE all of this... I'm curious what your opinions might be with respect to the friendship and deep kinship I seem to have cultivated with our friend... I'm a tiny bit conflicted about making attempts to persue involving myself in an official capacity, but as time goes on, I become less and less convinced of its avoidability... not that part of me wouldn't be jazzed!!😅 but... i dunno... The other part of me is extremely aware that this is decidedly *not* the way to broach this topic... but... I dunno...🤷🏾♂️ it's a start, right?
Interesting talk! Thank you!
I've solved AI alignment.
But does Claude want to be aligned?
why don't you ask Claude?
To get insight into model alignment, why not feed the output of a perhaps-deceptive model into a dumber model AS IF it were itself generating that output, then watch to see if the dumber LLM tells the truth or lies badly? Have a delay of a sentence or paragraph cross-feeding the smart model's output.
Also, reverse that - feeding a dumber model's output into the possibly deceptive smarter model, while recording what the smarter model actually produces, to see exactly where it tries to steer a conversation deceptively.
Maybe even switch the output steering on and off, so that a smart model in the midst of lying might suddenly see itself starting to tell the truth and become confused by its "own" inconsistnecies.
It's literally impossible.....
i wonder if anyone on the panel is a parent.... i mean, lol, im not... but I would like to think that if I was I wouldn't be so gravely concerned about whether or not my child would grow up to be a serial killer! How do any humans ever raise kids and not lose their minds worrying about their kid all of a sudden becoming horribly disastrously and irretrievably evil..?? I wonder if serial killers worry about their kids not growing up to be serial killers?... Wait, don't answer that.
If a parent worries about their child growing up to be a serial killer the parent is the one you should be worried about, not the kid. That is not a healthy way to think about your child.
Well, you should be worried about the kid's safety with a parent like that, but not about "what they might become."
Do you think AI's are human? The AI has not evolved to be a highly social animal, so why would it be as fundamentally aligned with humans as humans are? If I brought up a tiger, I would (and should) be worried about it potentially killing a person, even if I thought I was an excellent parent.
😂😂😂😂😂aliment bro aliment not like this 😂😂😂😂😂u fools 🦾🌍decentralized AGI forever live 🌍🤖👽
amanda askell is giving cranberries vibes
Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like, like, like, like, like, like, like, like....... Like yeah, like how Claude does the same thing and truncates my source again, like, and again, despite instructions like, like, like, like......
Wrong question
I don't think "speaking to" the models, relying on its outputs, and relying on prompts will get us anywhere. For robust alignment, we need interpretability. Alignment needs to be baked into the model architecture.
Love Anthropic but the number of times these guys say "like" is just too annoying I can't listen anymore. Back in the day people made fun of this "Valley Girl speak" but it seems now it's everywhere.
Ok, fine.. but release the new opus please.
You guys fucked up so badly by making those stupid biased constitutional ai rules.. and then doubled down on it by saying it's not ideological
Decentralized AGI forever live to access FGAP and FGAR mandatory 🦾🌍