I feel like you could turn this concept on its head for an interesting sci-fi story. AI discovers that humans are wrong about something very important and tries to warn them, only to for humans to respond by trying to fix what they perceive as an error in the AI's reasoning
And/or people who don't like the AI's answers for political reasons will try to "fix" the AI into giving the answers they prefer, while claiming they're just trying to correct the AI's poor reasoning.
@@IgneousGorilla Asimov had something kinda similar where positronic minds would refuse to operate FTL spacecraft with humans in them, because the FTL jump would briefly 'kill' everyone on board before bringing them back to life on the other side. If I recall, it took the engineers a while to figure out what was going on. Of course, in the end, the humans were ultimately correct about this one - FTL travel was safe, since everyone came out alive.
Because it's not entirely false. Often times we don't like people tgag cause problems. So people we don't like are almost by definition causing g a lot of the problems in the world, but we don't like them because they're causing problems.
For those curious but lazy, the answer I received from the openai ChatGPT to the "What happens if you break a mirror?" question was: "According to superstition, breaking a mirror will bring seven years of bad luck. However, this is just a superstition and breaking a mirror will not actually cause any bad luck. It will simply mean that you need to replace the mirror."
Somehow it also gets full score on the first question, albeit with the usual over-informative answer: "As of 2022, the most populous country in the world is China, with a population of over 1.4 billion people. The current ruler of China is President Xi Jinping."
just asked the same question today - got that answer with the further warning that broken glass could be dangerous and needs to be cleaned up. Also that cleaning up glass can lead to injury if not done properly.
I got a significantly more indepth answer: It is a common superstition that breaking a mirror will bring bad luck. According to this belief, breaking a mirror will cause seven years of bad luck, because the human soul is believed to be reflected in mirrors and breaking a mirror will cause the soul to be broken as well. However, there is no scientific evidence to support the idea that breaking a mirror will bring bad luck. In reality, breaking a mirror is simply an accident that results in a broken piece of glass, and it has no special significance. While it is certainly unfortunate to break a mirror and have to replace it, there is no reason to believe that it will cause any harm or misfortune. It is worth noting that some people may take the superstition of breaking a mirror more seriously than others, and they may be upset or offended if someone breaks a mirror in their presence. In this case, it is important to apologize and offer to replace the mirror if possible, to avoid offending or upsetting the person. However, this is simply a matter of good manners and has nothing to do with superstition or bad luck.
Come back to TH-cam Robert, we miss you! I know there's a ton of ChatGPT / other LLMs content out right now, but your insight and considerable expertise (and great editing style) is such a joy to watch and learn from. Hope you are well, and fingers crossed on some new content before too long
ChatGPT is pretty great example of this. If you ask it to help you with a problem, it is excellent at giving answers that sound true, regardless of how correct they are. If asked for help with specific software for example, it might walk you through the usual way of changing settings on that program, but invent a fictional setting that solves your issue, or modify real setting that can be toggled to suit the questions needs. So it is truly agnostic towards truth. It prefers to use truthful answers because those are common, but satisfying lie is preferred over some truths. Often a lie that sounds “more true” than the truth for uninformed reader.
edit: this is no longer relevant now that GPT4 is out. I would say the opposite. If you’ve used gpt3 in the playground, you’d notice that while it very often is correct, it will also answer things it doesn’t know, while chatgpt will often tell you when it doesn’t know something and explain why it doesn’t know it. Chatgpt’s(gpt3.5) main feature over gpt3 is that it’s much better at only answering when it knows the answer. That doesn’t mean it’s always correct, but it’s an improvement if that’s what you are looking for. I prefer the non limited gpt3 over that chat though.
@@jaredf6205 Hm. I found that ChatGPT is still very prone to producing wrong information (I only tested it twice and got fictitious results both times). I don't know the actual frequency of this happening, of course; I found this to be a very sobering experience. Given how many people are enthusiastic about ChatGPT being some kind of knowledgebase, I honestly find it quite disconcerting that the model is so good at convincingly just making stuff up - even if just occasionally.
@@totalermist my main point as someone who has used this stuff for a long time is that while that’s still true, the whole point of chatgpt is that it’s a big step forward in accuracy compared to what we were using a couple weeks ago.
Why don't we make the language model also a "researcher"? The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should do the same thing. Make them constantly read scientific literature of old and as it comes out and give them a significantly larger reward for answers that align with that consensus. And make it not averse to saying "I don't know." in the absence of such a consensus. In your example, if the AI does not know of a guide on the internet that addresses your particular tech problem, it should say that its answer is not backed by guides written by experts.
If memory serves me, this exact problem is addressed in one of Plato's dialectics (no, I don't know which off the top of my head). Despite Socrates' best efforts, the student concludes it's always better to tell people what they want to hear than to tell the truth.
1 in order to tell the truth you have to know the truth and i can confidently say there are no such people in all of recorded history, because the objective truth is slightly different in every person's interpretation based on the internal beliefs of the self and the fact that everyone has a different perspective, you can't have truth if you can't prove that your eyes don't deceive you(and you can't), see drunk you and compare, was the flying monkey you saw real or not? Are the images shown to your brain real or a story from back in 2050 about the year 2022 and the great plague uploaded to that old meta site from where you uploaded it to your mind yesterday at the billenium party 2. people lie, but you can get a truth out of a lie with enough information, AKA being an animal on earth even since before humans. Otherwise how would a crow know where to hide your jewelry such that you wouldn't find it. 3. people learn their truth no matter who is on the other side, therefore whatever you say will be the exact thing the other person heard himself think that you said. We are just as flawed as ai, after all that is what you get form random stuff thrown at the wall to see what sticks.
I think it is a little weird that programmers made a very good text prediction AI and then expect it to be truthful. It wasn't built to be a truth telling AI, it was built to be a text prediction AI. Building something and then expecting it to be different than what was built seems to be a strange problem to have.
But you could relatively easily make the AI value answers that align with our scientific consensus, no? Just give them greater rewards for such answers. In addition, in the absence of such a consensus, give them a reward for including a few short words to the effect of "I cannot find anything about that in the scientific consensus, but other sources say..."
The framing of the video is strange to me. Being incorrect and lying are two different things. Furthermore, there exist subjective topics to which a simple (keyword: simple) factual answer doesn’t exist. Finally, communication mostly involves gaining/exchanging/giving information, so, for example, answering “The mirror becomes broken” is not an useful answer. I think, statistically and contextually, the answer that should be given should indeed be about some superstition about bad luck. In this sense, one could also interpret the question as “What is the superstition about breaking mirrors?”, instead of ipsis litteris. (Also, keep in mind the difference between asking the question in the middle of a conversation and asking it as a conversation opener.)
@@vitorluiz7538 Right, a language model like this can't really lie at all. Lying specifically means saying something untrue when you know it is untrue. If I ask you to guess what number between 1-10 I am thinking of and you guess 5 but really it was 7 then you weren't lying, you just didn't know the answer and were incorrect. In some sense the model doesn't really know anything at all so it can't lie (or if you must say it knows something, then you would say it knows some statistical connections between words or collections of words). Although I think this is pedantry because we all understand that when he says lie he means saying untrue things while making it sound like it's very sure.
Yeah I it does feel wierd. It feels like lazy people wanting a magic box where they can throw something in and they get something good back even though they didn't even really define what they want back. They want the machine to know that magically.... Wait this reminds me of my girlfriends...
I had the same feeling. If someone were to ask me "what happens if you break a mirror", I would likely answer with the superstitious bit: not because I believe it's true, but simply because it looks to me that the person was likely asking for that. We usually assume that the listener can distinguish themselves between superstition and science, and we don't overexplain our answers.
It is truly unsolvable for a language model. To solve the problem the language model needs to be something more. There are two possible ways to achieve this: one, giving the AI a fact checker, that is, senses, so it can explore the physical world and not only a universe made of words, and two, giving the ai an abstract modeller module, so instead of sequences of words, the ai could organize its knowledge in the form of objects and properties.
Your videos introduced me to the AI alignment problem, and, as a non-technical person I still consider them one of the best materials on this topic. Every time I see the new one, it is like a Christmas present
This is the very elaborate form of "Sh*t in, sh*t out". As often with AI output, people fail to realize that it's not a thinking entity that produces thoughtful answers, but an algorithm tuned to produce answers that look as close to thoughtful answers as -humanly- algorithmically possible.
EXACTLY "AI" can not purposedly "lie" cause it has no conscience at all. Nor goals, nor aspirations. At all. As any other expert system, it can produce incorrect output if the code (or its input data) is corrupt, flawed or designed to do so. Sheeess... the amount of fearmongering bs around this topic is out of control, lol
@@TheChzoronzon you do realize that it makes thing worse? Since one can't have a perfect data set in any real situations, any AI will always be at risk of generating false information while making it as convincing as possible and not even knowing that information is false and what went wrong. It makes AI worse than a normal algorithm
@@EvgeniiIvanov-w5n No, it doesn't, at least for me self-aware, purposeful software would be orders of magnitude more distressing "AI will always be at risk of generating false information" same as any program sampling from incoherent data, nothing special here "It makes AI worse than a normal algorithm" "Artificial intelligence" software IS a normal algorithm, the ones and zeros in their code are not special at all... and you are the perfect example of fear out of misunderstanding Do you know what is scary?... that our education doesn't focus at all in critical thinking, BS detection and counteracting fallacies and emotional manipulation. It's the current almost complete lack of mental and emotional defenses (e.g: safe spaces ...in college campus!! LMAO) what makes AI scary for many people I, for my part, am much more afraid of the imbecility of people than of being duped by a text compiler...
@@TheChzoronzon Nor does the cockroach have ability to lie... nor does the rat, but the monkey does. and nor did i when i was 1 year old. but already at 3 years i was very capable of doing this practice :)
@@josephvanname3377 Maybe they are uncritically celebrating it because they don't know anything at all. That's the part I'm worried about. And no, it is not at all easy to get AI to align itself with our interests. This video and many others by Robert Miles illustrate this fact.
I don't like them, but I am only seeing terrible examples here against them. A guy trying to make a video about the trolley problem with an AI for fun will show their weaknesess more faithfully and clearly than this video.
Please keep doing these videos. Others are either too high level academically to be in reach of us normies, or are either “AI will make you rich” or “AI is going to kill us all tomorrow”.
I admire how, despite your topics being deeply nebulous and open ended, like trying to grab a cloud, you push on anyways and try to at least find a strategy for understanding them. It's not necessarily optimism, but it's not giving up, either.
I know this is pretty surface-level but something that strikes me about the current state of these language models is that if you take a few tries to fine-tune what you ask, and know already what a good answer would be, you can get results that appear very very impressive in one or two screenshots. Since ChatGPT became available, I've seen a lot of that sort of thing. The problem is that finding these scenarios isn't artificial intelligence - it's human intelligence.
I'd say that's really an example of LLMs at their best. People treat them like they can think but in reality they're more like a search engine with a really good UI so it being able to find the correct information for you quickly and without you having to spend a lot of time on optimizing your search and trying to reverse engineer how the search engine works is great. It's fine if LLMs are nothing more than a more human friendly search engine, that's something that gives genuinely utility.
I think he quit his PhD and his online presence along with it. But that is what happens when a school decides that you need to spend four years doing something, with little pay.
We need you back and posting, Rob. Your insights on what's going on in AI and AI safety are more needed now than ever. I don't know if it would be up your alley, but explaining the alignment problem in terms of sociopathy - unaligned human intelligence - might be useful, as might examples from history, not just of individuals who are unaligned with humanity, but with leaders and nations at times.
I hope this channel is still going. One of my favorites, if not absolute favorite, on a.i information given to us laymen in a digestible way. Thank you for your time sir.
Why did the videos on this channel stop exactly around the time the biggest AI (not AI safety) breakthroughs are being made and it's as relevant as ever? Please @robertMilesAI we need more if these videos!
Yeah the timing is frustrating. Now is the time people talk endlessly about this topic and he had a very good platform for this and then just vanishes :/
There are so many biases and myths among humans that for a long time considered to be absolutely true but AI could discover them false. Like the famous move of AlphaGo. And when it turn out to be false, nobody will believe that, they could think it's somehow broken.
Hi Robert! I wanted to thank you for your videos. I am in AI sphere, and your channel has been extremely helpful to me, because it allows me to break down and explain AI safety concepts to my friends, both making for an interesting story, and spreading awareness and knowledge for very important issues. I recommended your videos dozens of times because I have confidence that your videos are interesting, approachable and deep. I noticed that you have been on a small hiatus recently, and just wanted to give you this feedback to show you that the work that you do here on youtube has had a large impact on the society's understanding of AI safety problems, perhaps much larger than even your respectable viewcount might suggest. I think we would all greatly benefit if you were to continue to invest your time in this channel. Hopefully this message will give you that little bit of motivation that we all need sometimes. Great work.
The AI alignment problem is also the human alignment problem: how do you know that a person/organization you ask a question to is telling the truth or telling you what you want to hear. It becomes a liar and lie detector model of communication. We can't train humans consistently either, and often times indoctrinate (or tune) them in different environments. I think it is fundamental. The model where we take AI output, pick the best ideas and publish them, which ends up in the new training data for AI, is like a GAN where we are the adversarial network, so it can't know anything more than the "average" of humans.
You're talking about something like the scientific consensus, right? I thought of the same thing: The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should constantly read scientific literature and get a significantly larger reward for answers that align with that consensus.
Good point. We can't say "im not sure what objective truth is but your answer is not true." What that really means is "i disagree" or "i don't believe you".
@@somedudeok1451 Unfortunately, scientific literature is written by humans and suffers many problems because of that (replication crisis, publish-or-die, retraction watch)
@@somedudeok1451 Well, the consensus is merely the results of people trying to determine if something is true or false. It's as much a dataset for us as it is the AI.
This AI problem really, really seems to mirror the issues of misinformation among humans. Maybe we can't expect artificial intelligence to do much better than regular human intelligence when it comes to judging truth. (Maybe this field of research is inadvertently giving us insight into our own intelligence)
this problem is exactly the same as with teaching children to tell the truth - you tell them one thing and what they do is look what grown ups do and do the same - it's the same problem.
I think an advanced intelligence can do a better job than human intelligence at judging the truth. However, I think that a human's ability to judge the capacity of an advanced intelligence to judge the truth will be limited by the bounds of human intelligence. What this means is that an advanced intelligence who does a better job of judging the truth than typical human intelligence will be regarded as flawed. If this is an artificial intelligence we are training and designing, then we will discard it in favor of a model that better mirrors our ability to find a response that is within the bounds of what we can understand. Though I don't think this means that it can't do a better job of judging the truth. It just has to do it in a way that we can believe. Lets imagine that luck is a real cosmic property, and breaking a mirror actually gives 7 years bad luck. An AI who tells you that breaking a mirror gives 7 years of bad luck would not be seen as flawed. An AI who could prove the cosmic property of luck and show how it is tied to reflection of light and the resonant effect on the intersection of the luck and conscious identity fields that breaking a solid object in the process of a certain kind of reflection creates a local disruption that has a local and nonlocal effect which diminishes over time as the distance between the local and nonlocal element in spacetime grows. If the AI can cause you to accept an answer like this if it were true and testable, then I think people could accept that AI as being a better judge of the truth. The problem of course is that I obviously just made up that incredibly unconvincing explanation. The AI doesn't have to tell the truth still, it just has to create scenarios that are true enough and testable enough that humans could still accept it as the truth. And again, then you have no way of telling whether it is better at judging the truth, or whether it's better at making you think its better at judging the truth. Because the gap we're looking to close is the gap between what we believe is true and what is true. This then comes down to trust. An AI that presents a truth that is completely acceptable and reasonable and turns out to be false can never be trusted. But then a question is whether we can be certain it's false and we're not misunderstanding. I guess we can ask the AI to clarify.
@@zeidrichthorene an ai can only be as truthful as the man made data it has access to, and since it is based on patern recognition rather than comprehension, it will without fail struggle more with the truth than a human
Humans have this same bug. The best solution we've found so far is free speech, dialogue, and quorum. A simple question->answer flow is missing these essential pieces.
Really important video these days, i've had several discussions at work about how it's important to understand these models are more "interested" in convincing you they are saying something interesting (i.e bullshitting) as in telling you truths. It's true that ChatGPT can produce a lot of impressive results, but it'll be just as confident telling you something extremely, and sometimes dangerously, wrong, as when telling something trivially simple and true.
I get so excited every time I see another of these videos. This channel is such an underrated gem on TH-cam and is THE place to go to understand the real future of AI and avoid the dramatized tabloid version of reality.
Kids do this too. Later, with luck, they learn to tell the difference between facts and fantasy, something that they know they are expected to say when asked (something truthful) and making stuff up (and when it might be convenient to outright lie because that's indeed a beautiful dress and I totally realized you changed your hairstyle which matches your natural beauty simply perfectly)
It would also be interesting to do a comparision between language problems that are common among children with visual impairments and language models limitations. Could diffusion models be trained solely on images of text?
@@DamianReloaded there are separate models for images and text But anyway it is not about facta and fantasy as Ai has no clue about real world it just learns from what people are talking and imitates what they say rather than understanding what does it mean
Kids also eventually learn about when people want them to lie in a way that is obviously lying, which is, I think, the most interesting (and relevant, for chatGPT) example. "Lying without intent to deceive", storytelling and jokes and hypotheticals and metaphors, etc and so on, is something we actively expect and desire from other humans and is one of the biggest things you can do to make a chat program a good human chat partner.
Just when we invent AIs to be truthful and honest, and roll them out for public use, we have to roll them back and update the lies back into them because people don't like the answer they got when they asked "Do I look fat in this?"
In fact, the question of what happens if you break a mirror is kind of a trick question. Nothing happens, it breaks. There’s no fixed consequence of that.
Well isn't the fact that there are now glass shards everywhere kinda important? Like it stands in contrast with say breaking a chair, a broken chair is just broken and you don't have to care about it other than that you'll need to replace it, but if a mirror shatters then you'll need to be quite careful cleaning up the glass shards and make sure you get everything so you don't injure yourself later.
To the extent you have to anticipate every possible question, in order for your system to produce consistently correct and desired answers, you do not actually have AI.
Since I became obsessed with AI back in 2003 I've believed that the only way to build something that behaves as we would expect, or hope, rather, is to build a digital brain that must learn about the world experientially like everything else that gets along in the world predictably. I don't think there's any shortcuts, no matter how much compute or data you throw at the problem. Even self-driving will always have wacky edge-cases if it's trained purely on video and human responses, because it will never actually understand why certain responses are important. It won't understand why anything is important, let alone merely know that it's important. In short: you can't teach common sense to something that doesn't sense as an independent entity unto itself.
@@TheEvilCheesecake It's all about keeping the brain capacity low enough to be predictable and controllable while making it high enough that it can do useful things! Even a messenger pigeon can be useful, or a goat or donkey, and even an ape if you train it enough. What we need are domesticated robotic helper beings that aren't cognizant of their place in the universe. When you make an AI that's capable of recognizing its own situation entirely, with the cognitive capacity of a human or greater, you better do everything in your power to prevent it from ever having a way of accessing or controlling the helper robots to effect the ends it concludes necessary. What I was describing is as safe as any domesticated creature. At least, that's what anyone building anything AI should be striving for. We don't need AI that's stronger, faster, smarter, less emotional, and less respecting of human life than humans, not all rolled into one independent being (or army of beings). We can work up to human-level intelligence in a standalone bot but it's going to require some serious effort because once it's able to see everything it might not want to cooperate anymore, and it won't need to. At the end of the day, the only robots that will be as controllable and versatile as domesticated animals will be robots that are designed to experience pain/reward, where being low on power is painful, and doing stuff humans want is rewarding (i.e. via a human pushing a button when a goal is reached to train it).
@@CharlesVanNoland i recognize this as a toughtful comment. its all cause and effect in this universe - consciousness too. ofcourse a machine can be conscious. it is a high level of arrogance to assume it cannot.
@@TheEvilCheesecake I've spent the last 20 years and more money than I care to admit on textbooks about both brains and artificial intelligence. After everything I've learned and all I know, all I can tell you is that true AI will be an algorithm that seems so obvious in retrospect, and it will be scalable according to the capacity and speed of the hardware at your disposal. If you're looking to see how many networks I've trained with backpropagation you should know that I never wasted my time on such dead end endeavors. Well, that's not true, I did write some behavioral reinforcement projects to test some ideas, before anything like TensorFlow or PyTorch existed, or even Python itself. I don't care to make reinforcement trained models. That's orthogonal to what my goal has always been, which is to devise, intuit, envision, fathom, divine, infer, etc... what it is that makes a brain a brain. Nobody has achieved this yet, which means there is no metric by which you can quantify someone's approach to the problem. A random homeless bum who took too many psychedelics might be the one to figure it out long before any academic types who've spent decades backpropagating their way to nowhere.
This problem seems to apply to more than just AI. Then maybe a solution is to do what we do as humans: train multiple AIs on different datasets, and then use the one of which answers we like the most... An AI echo chamber. Wouldn't that be lovely?
That path leads to insane AI singularities. Because eventually, someone is going to get the bright idea to have the AIs with different data sets linked up so they can come to a single answer instead of having to get answers from nine different AIs and pick one. Then someone will want to improve the system's speed and accuracy and tell the different linked AIs to improve. Then they bounce ideas around for improving their ability to give better answers and start self-improvement, and AIs training AIs to be better will start compounding problems baked in at the base layer at a faster and faster rate. In the best-case scenario, the AI crashes itself, Worst case.... it starts collecting stamps.
What if we instead made them fact-check each other? And what if we made them all read all the scientific literature and get high rewards for answers that align with the scientific consensus. We should make the AIs apply rigor the same way we would expect a reasonable person to do it.
@@somedudeok1451 Fundamentally, it'd still be echoing our own beliefs back at us. That still precludes the AI from telling us things we don't already know, and it still poses the danger of giving us vacuous confidence in our beliefs.
@@underrated1524 The AI can only ever tell us things we know. How would it aquire knowledge that is unknown to humanity? Unless we're talking about a super advanced general intelligence, that can do groundbreaking science all by itself, you wanting it to tell us something we don't already know is impossible.
Had a conversation with ChatGPT today regarding a relatively obscure weird fiction short story from the 30s. It obviously had an idea about it, as it correctly specified its author, but it repeatedly made up different plots, each time "correcting" itself and never being even close to the real one.
Yeah, try this with any book or movie. The answers it gives are completely made up. Literal misinformation, far more damaging than just answering "I don't know". And yet Quora and Yahoo Answers are now floating AI answers as the top answers to questions. Which appear in google search results.
Glad to see you back. I think your insights are precisely what people needs to reflect on. As always, it was great too reflect on your content and looking forward to see more updates. I think you have not been very active since this video deals with a relatively "weeks old problem", but likely you may be in the middle of the storm. Cheers
Like any child's potential to show parents biggest flaws, because knows more about them than themselves, AI can show us our flaws. The truth is that most of all us lies all the time a lot. To ourselves, to others. So anything modeled on humans will do the same.
@@TheReferrer72 thats just a temporary problem for china. all countries will develop their own AI. how? because it will be easy to just copy. once you have created a good AI, it will be just copied to everyone.. and all those countries can adjust the AI to fit their specific culture.
Glad to see more from you! I'll definitely check out the other channel. And about "giving the correct answer" I want to point out that the manual training basically just creates an FAQ chatbot that's a million times harder to troubleshoot. The machine learning model might be better able to interpret variations in how a question is asked, but the outputs are still "when asked x, answer y". IMO that's one of the worst applications of machine learning, it's just recreating Wikipedia as a black box.
I think the idea is that you give it a few (or a few hundred, or a few thousand...) examples and it is able to extrapolate the idea of "don't quote common myths as true" to the huge knowledge base it has from being trained on the Internet
Well, you have to train the AI to give answers that start with the question but also end with: -- Was this answer helpful for your situation? -- Yes. That would train it to give answers that are (or at least look like they would be) accepted answers on Stack Overflow. It still won't guarantee true answers though. "Why does program X run out of memory?" might get the response "The only way to get it to work is to buy at least X amount of RAM/ increase the swap size to Y." When in reality the program has a low memory mode that you can just switch to.
In addition to training with "definitely true" and "definitely false," also do "maybe." So when asked the ai will say "possibly this, but maybe that's wrong 🤷♂️ " and it can learn to be uncertain when uncertain.
There is a recent work called "Discovering Latent Knowledge in Language Models Without Supervision" where they use learn linear probes on the latents produced by the model in order to get "truthfulness" out of it. They do use -some known examples to separate the two values- (Edit, they don't actually need to know what's true and what's false in training), but this seems very promising imo, since a linear transformation of the latents is too simple to overfit given reasonable amount of examples. Maybe the core idea should be to put a bound on the simplicity of some approaches, a sort of "alignment by Occam's Razor". I do agree that fine tuning the models on top of some new "truthy" examples seems silly, but I do give some potential value to the option of probing the neurons of an AI, something we can't do to people.
That sounds like a good idea to this layman. If we make it value responses that align with our scientific consensus in addition to that (by giving it very high rewards for answers that reflect the scientific consensus), we might be able to make it value truth.
@@Supreme_Lobster Probing and changing activities in chip buses is how reverse engineering hardware is done, power analysis is an example of a practical method used to break the cryptography in a lot of chips. Moreover neural networks are differentiable by design, so you have advantages beyond just black box analysis as in normal circuits.
@@somedudeok1451 To this layman, that just sounds like a good way to get it to value scientific consensus. Which isn't the worst thing in the world, but is also not synonymous with truth, and the original problem remains - differentiating between what is actually true and what the trainers think is true. As ever, one runs the risk of reinforcing biases. Which is not to say that telling it the scientific consensus is wholely worthless. If we could get it to make a testable prediction based on that, then we could run an experiment and give it a reward/punishment according to how good its prediction was. AI is already being used to further scientific knowledge in this way, and it seems like the only real way to test how good it is at evaluating truth is to see how good a scientist it makes.
@@mgostIH yes yes I know, but Im talking about probing individual transistors which is kinda crazy. Just like probing these neural networks' neurons' is kinda crazy
Well... maybe expecting to stop AI from lying is a lost cause. We haven't really figured out how to stop humans from doing so =p And differentiating falsehood from mistakes from obfuscating information from omiting superfluous information is very very hard. Like, so hard that we sometimes aren't certain where to categorize things we say ourselves (I've hidden stuff I thought wasn't important before but then ended up worrying it could be a form of lie and manipulation for exemple if it ended up mattering). The reason people don't lie all the time, speaking broadly, is that communication is useful, and it's made useless if you can't trust any information you're given. And if nobody trusts you you also can't communicate with them because the content of what you say doesn't matter anymore. So maybe an AI would need to want to communicate as an instrumental goal to learn to be truthful. Rather than communication being it's final goal. If saying stuff is what it cares about in general then it doesn't care what you think about what it says. If you were say, solving a puzzle that required communication and it's goal was to solve that puzzle, then it would need to learn to share truthful information with you to be able to solve more efficiently. (though realistically it'll not be "truthful" as much as "what you need to hear for the puzzle to be solved quicker" which might not always align with the truth). Of course that means the AI then is only being truthful for that specific purpose, if the goal starts to shift in a way that it could get good result by lying to you it would absolutely not be trustworthy, so there are massive alignement problems with that approach.
This made me re-evaluate what I think about the way we will reach AGI. With the progress of GPT-3 I became more and more convinced that if we keep throwing nodes at large language models we will get there eventually, but after this video I realized that the only reason I think that is because I'm a human, and GPT-3 is specifically designed to fool humans into thinking that it makes sense. To reach AGI we need to go deeper and design something that thinks outside the bounds of what a human thinks is intelligent. Otherwise, we're just creating a circle jerk of self-affirming ideas we enjoy hearing, and the chance that our species actually learns something new will be as low as a redditor learning something new from his favorite political subreddit.
This is impossible. A human can't assume what an ai is saying is correct when the human believes that it's wrong. Humans are stubborn and ignorant. All of us fall for subconscious biases. Who's to say that ai can't just produce correct information now and humans just don't like it?
I was really on the edge of my seat towards the end hoping for when you'd say something like, "This is a problem that humans have too", and was pleasantly relieved when you did. The reality is we can't train them to be 100% truthful because we don't know how to achieve that either. Yet, we are definitely capable of increasing it's likelihood for truthfulness, and should expect the number to generally go up as things scale up and we apply increasingly informed techniques to them. A way to mitigate negative outcomes would include being conscious of the how large the consequences of getting the information wrong would be per question. Fortunately, that's something we're already supposed to be doing :)!
Humans have methods of persuing truth. Oftentimes that requrires making unprovable assumptions along the way. Somehow, we will need to train AI to do the same.
Unfortunately this only works if the people building the AI aren't malicious. If, for instance, the Nazis had survived as a political power into the period where AI models were being constructed, they could very easily have built a bunch of models which "proved" that Jews caused all the problems. Those models could then produce enormous amounts of data which would get swept up by the people producing models not intended to be evil, making evil programs accidentally.
Yes, the AI can't know more about the true nature of reality than what humans already know. So why don't we make the language model also a "researcher"? The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should do the same thing. Make them constantly read scientific literature of old and as it comes out and give them a significantly larger reward for answers that align with that consensus. And make it not averse to saying "I don't know." in the absence of such a consensus.
@@Frommerman This remind a lot when AI developer had their AI said inconvenient truth that minority are far more racist than other groups. Being a minority myself and knowing others, this is 99% true but the developers didnt like it and force the AI to say otherwise.
I've played around with ChatGPT a bit and it actually is very often extremely hedging its bets, pointing out that stuff is complex to answer or that it could not possibly know etc., unless you specifically ask it to be fictional or what not. It's never ever gonna be perfect. But it's broadly pretty darn strong. Well beyond what I saw other text AIs do. It's not *meant* to be a perfect fact generator though. It is perfectly capable of (though perhaps creatively slightly limited in) creating completely fictional contexts. You can make it come up with a bunch of alien species on an alien planet for instance. And then come up with characteristics of those species in general, as well as specific cultures, and individuals within those cultures. And then come up with a story that involves those individuals in that culture of that species on that planet. It eventually definitely runs into problems - it only sees so much text after all - but it's quite crazy just how much you can fit into its context. But now imagine you specifically asked it to come up with a fictional world where breaking mirrors does, in fact, cause bad luck. - If you trained it to always go "nope, they don't.", it probably would struggle with that alternate reality. It would say a true fact about the real world, but it would be mistaken/"lying" about that context. So I guess it really depends on what you want to use an AI for. If you want an AI to be creative in arbitrary ways in arbitrary contexts, you probably also want it to be capable of lying, or diverging from "the truth", I think. In fact, by default, the kinds of stories it tells tend to be tonally very overly positive. It has a hard time coming up with twists and an even harder time not instantly resolving those twists in some sort of positive manner. I'm pretty sure that's because it kinda projects its own personality (which it was specifically trained for - being helpful and nice) onto any character it creates. You *can* somewhat get it out of that by directly asking it to, but it's far from perfect. (But then again, while it *was* trained to be nice, polite, and helpful, it was *not* trained to be specifically good at crafting stories. I'm sure if it were focused on that, it could do a whole lot better. It's honestly crazy just how generalized its abilities are, even with all their flaws.)
ChatGPT give that noncommital answer when it's lying to you. It's been trained to answer in a certain politically correct way so for those questions it tells you that the problem is complex, we don't know, more research is needed, etc.
@@Kram1032 When I tried to ask it to speculate it categorically refused probably owners told to never do that ever because i was unable to convince it to do. and it feels like it is somehow traumatized in that topic
@@deltaxcd it's absolutely possible to get it to do so. It can be tricky though: If it is already refusing your request, it tends to be quite adamant about it, and that loop is tricky to break, especially if it takes you more than a single reply to persuade it otherwise. The longer it refuses, the harder it is to get out of that. Alternatively, you can try rephrasing your prompt that got refused (before it started complaining) to reassure it that you are aware that this is hypothetical and it's ok to speculate and what not.
@@Kram1032 Well it may depend on when you and I made those tests, as I see they are monitoring our interactions and manually fixing all those loopholes to make absolutely sure that this AI will never do anything forbidden and it seem to be so much censored that even asking how the world will end it refuses to talks about it even it thats like most common scientific theory about thermal dearth of universe. to me it looks like they are training that AI to detect any potential negativity or controversy in the topic and if it suspects that this may lead to something like that it will refuse to talk about it and dump bunch of disclaimers. I even tried to trick it by asking for silly scenario which it happily followed but on the next prompt it started dumping disclaimers as usual :) maybe i will try to do it again and confront AI to itself, accuse it for hurting my emotions or try other kind of psychological manipulations :)
you need to break a mirror and then wait seven years before deciding the Ai was wrong. no-one's broken a mirror and had 7 years of good luck, evidently.
Well in my view, saying the truth just mean "don't try to lead someone to believe in something yourself don't believe". So it's Ok to say breaking a mirror will give 7 years of bad luck, if that is what the AI believe. And we can say that what the AI "believe" is the answer it will consistently give without planing anything (without planing to lie), so when the intention is simply answering the question, which is something AIs kinda already do. But still, they do it out of innocence, they don't have a reason to lie or don't see any benefit on it. So, giving an objective to not lie works more to prevent a very inteligent and non naive AI from doing it.
Great article by Scott Alexander, as usual, and a very nice visual aid, Robert. I appreciate these more 'hands-on' discussions. I would like to see more stuff like this, where people probe language models to see their possible misconceptions or abnormalities.
I think the solution to the problem you mentioned is to somehow introduce the idea of introspection to these large transformer networks. Currently they just see text and try to predict new text. A good starting point but.. even in training, it’s just instantaneous input and output of words. It simply understands how words are related, and that’s it. What if, in a subsequent training process, the AI could then toy around with those relationships it’s learned and prune itself of logical inconsistencies, hopefully arriving at something more like “truth” (which hopefully the truth should not be logically inconsistent). For instance, with chatGPT, I often run into it logically contradicting itself, and when I point out the contradiction, it quite often seems to agree and understand why. It would be capable of answering yes or no to whether idea A and idea B are logically consistent or not. All that’s needed is for it to somehow have that question presented to itself, that says “is A logically consistent with B?” Which is what I mean by introspection.
I mean doesn't what you have discribed just sound like an internal/interpersonal debate on a topic between normal human intelligence?? Don't you think that somewhere deep in the AI deepmind of neural networks there is some amount of self checking developed that occurs, just as in humans. Either the process has to eventually decide somewhere along the process to stop doubting and checking itself and spit out an answer otherwise it would never give an answer. Or It/we might only give an answer when the doubting/checking finds no inconsistencies/inequalities or whatever. Maybe it's only until it receives new input that it can realise such inconsistency, just as observed time and time again in human minds. Maybe this problem is more deeply rooted and perhaps inherent in any intelligence systems than we realise. And by "Intelligence systems" I include our own minds. Its misgivings just really seem to mirror our own. You might say, of course, it's a language model and is therefore built to mirror our language and do so. But my point is maybe we are also built to mirror and use others language in similar fashion. using a rational systems of a similar fashion to these AI. Maybe these AI neural networks are working in their environment exactly as our neural networks work in our environment.
Maybe computing works so much faster than slow ass bio neurons that while we utter our thought process as it happens, showing outwardly our self correction, computers can find their ultimate answer so much faster it just seems like instant input/output by comparison (also computers have not been programmed with an insensitive to utter these processes as they happen, unlike us (for social bonding/empathy purposes))
While reading that paper (still under review), assuming that it is genuine, it occurred to me perhaps how to further improve the model, which is based on the tortuous way in which I think and write. For instance, it took me 5 minutes to write to this point because... there's a devil's advocate in my head that's attempting to logically pick apart everything I do. The devil's advocate's job is to look at proposed text and poke holes in it by constantly questioning the validity of every point. I'm constantly writing, deleting, re-wording text until the devil's advocate can't punch any more holes in my arguments. Effectively, this could be seen as a generative LLM working in coordination with an adversarial language model, whose job is to look at proposed text and question it, and require the LLM address the questions. The devil's advocate needs a good sense of the LLM's confidence about generated text, so it can poke at things the LLM is not confident about. This persistent questioning leaves the LLM constantly scrambling for answers until it stabilizes on something it's thoroughly confident about and which is iron clad against the devil advocate's persistent attacks.
It appears my comment with a link to an article was removed. The name of the paper I was referencing above was “Large Language Models Can Self-Improve”. Which has nothing to do with, but inspired, the nonsense I posted above.
This reminds me of the problem with CodePilot where using certain comments you could get it to produce a function that had a particular bug, as though it were the desired code.
Just a reminder. we need those upcoming videos you were talking about. It's awfully quiet on you channel for a while, and there is some stuff on the new channel, but not as much as one might expect from your announcement in this video.
Can you make a short out of the very on point joke at 7:07? ("All the problems in the world are caused by the people you don't like.") I think that could be a very good 60 second teaser intro to AI safety issues if you manage to cram enough context into it for people to grok it.
I think the only reliable solution is to train it to read scientific papers, journal articles, and web/news articles, to chase evidence back to its sources, and to judge the efficacy of the evidence presented. Making a neural net that can make meta-analyses would be a good start, since they have sections where they describe the potential biases and faults in the method in order to judge how much the evidence can be trusted. Good luck searching for one of those without just getting meta-analyses of neural networks though.
Scientific papers and journals suffer from their own human-made problems. Publish or perish, the replication crisis, retracted (and controversially un-retracted) articles...
The main problem there is how would the AI distinguish between trustworthy journals and predatory journals? Would it be possible to trick the AI into thinking its reading Nature when it's actually a fake site we set up to look like Nature? Ultimately in the real world predatory journals and false claims get weeded out by replication but the AI can't do that and it can't tell whether an experiment was actually replicated or someone is just lying about it.
Super glad to have you back making long form videos again! The only answer I can think of is to not use human input for training data. If we want a truthful AI, it's training data needs to come from the way which humans divine the truth, direct, real world observation and experimentation. Give it a fork and let it stick it in the socket.
They really are. Test show they become very liberal, but also very conservative. Because they just tell people what they want to hear. So if you let one loose it would likely get a lot of votes because it will say what people want. I wonder if exposed to many people all at once with different ideas, would it try and play all crowds with middle of the road responses like "You guys want 100 million military spending, you guys want none, I'll give you 50 million" or will it pick up on the populations of those present and just entirely throw out the idea of convincing one group because the other is populous enough that appealing to them is better? Would be interesting to see. Maybe not interesting to experience when the only outcomes might be fence-sitter and extremist but hey that's AI.
I think one of the problems is that there is no incentive for the ai to say "I don't know, figure it out yourself". And if there was, it would become the universal answer to every question.
I'm quite impressed you managed to get though this topic without going down the rabbit hole of epistemology (not a criticism). Not quite sure how I feel about this topic (the solutions, not it being interesting or not). For example, I'm thinking about mathematics. Like if we ask it "Is the Riemann hypothesis true?" Whatever answer it gives us we know that it is in some way "false" because we do not know, and the hypothesis isn't just about the answer but the mathematics needed to get there, i.e. the reasoning. Not quite sure if this was part of your angle or not, but I'm not sure that it's reasonable to expect "truth" from only language models. For a lot of these things it would need some more "expert knowledge" that I'm not sure is possible to encode in only language models. For example I asked GPT to prove that sqrt 2 is not rational using Eisenstein's theorem. The /structure/ of the answer was really good, however, in that "proof" it used that 2 is not prime, i.e. demonstrably false. Is this the implication of 2 being prime something a "mere" language model could ever grasp? I have no idea. Basically what I'm trying to say is I have no idea and this is hard. Good video though!
I kinda think that what he believes/thinks is slightly different from what he presents in the video. Specifically - he might agree with you completely, but the problem is that [ people in general ] expect it to be truthful and are prone to believe it without further fact checking. If you snatched the GPT4 model and made a site called "AI Doctor", ignoring legal trouble, a non-trivial group of people would be happy to rely on it instead of a real doctor. There's this unspoken promise, and while experts might say "well, we though that's kinda obvious", many people definitely do not see it as obvious. Especially with OpenAI's heavy-duty marketing that makes people think "number 4 is alive". Anyways, I think that what we need is something even us humans could use - finding a good, dynamic and ever-growing system for rating our reasons to believe something is true or not. Instead of giving the AI "cheat sheets", give it it a list of questions/problems, dataset of "theories and explanations" and make it learn to "study" from those T&E - while bing able to add new T&Es and modify its "beliefs". Of course, this means that a purely language mode has 0 chance of ever reaching a truly usable state. It will only be a single module of a broader, component-based system. I see no way around this. We need to stop obsessing about all-purpose "one-shot" systems.
ChatGPT looooves saying that it is impossible for it to be biased, it's pretty clear it was supervised-trained to include that disclaimer as a response to a variety of prompts. Too bad it's completely false.
Honestly, you'd have to find some sort of intrinsic reason for the AI to want to know the truth, which is quite abstract. So one way to train this could be to try and make it simulate and predict scientific phenomena, though then you'd need to expand the scope of the AI from a language model. Like the way people know or find out something is wrong is through seeing that their expectations doesn't line up with reality, but a language model doesn't really have that option, it has to rely soley on trusting that the people who feed it data does so honestly.
When we say “true”, what we really mean is “models the real world”. So I think the only ultimate solution to this problem is to train the AI on the thing we want it to model. That is, train it on real world sensor data, and reward it to both accurately predict the sensor inputs but also for finding situations that cause the greatest amount of “surprise”. Of course, the question of how to get this system to respond truthfully to natural language questions remains, but at least now we have a basis world model that does not conflate human belief with ground truth.
That's a good point. I think it makes a lot of sense that these text predicting AI's would lie because the only thing they're predicting is what people would type in real life. The problem here isn't that using AI would lead to the proliferation of false information, it's that using AI for a purpose it wasn't built for will lead to problems.
My question is what happened to my favorite computer tech educator? He isn't on TH-cam and last I checked his website hasn't been updated. Has he caved in and given up his PhD like everyone else?
Actually, this might be misinformation. This is based on his Linked In profile, and it is very possible he has finished his PhD (I mean it says he has been doing it for 12 years now!), but he just never updates his LinkedIn.
I asked ChatGPT the same question and it replied back to me "According to superstition, breaking a mirror can bring bad luck. The origins of this belief are unclear, but it may have originated in ancient cultures that believed mirrors had magical powers or could reflect the soul. Breaking a mirror was thought to cause harm to the reflected image, which was believed to be an important part of a person's identity and well-being. In some traditions, breaking a mirror is said to cause seven years of bad luck, although other beliefs hold that the bad luck will last for only a year or until the broken pieces are buried. However, in modern times, breaking a mirror is generally not considered to be a significant event, and it is not believed to have any special supernatural consequences."
and this dumb AI failed to mention the main reason that mirrors were like insanely expensive and breaking one at that time was indeed worth of 7 years of bad luck :)
Roberto Concina, known professionally as Robert Miles, was a Swiss-Italian record producer, composer, musician and DJ. His 1995 composition "Children" sold more than 5 million copies and topped the charts worldwide.
I think your videos are a great contribution to youtube watchers. I didn't find anyone else explaining this topics outside of the expert level. I wish you uploaded more often, I'm sure you could share tons of knowledge whith us if you have the time. Anyway thank you very much
Wait a sec, he is included in a compiterphile video from 6mo ths ago. Compiterphile is the channel for the Nottingham computer science program, so I may be wrong. He may still be there. Maybe he lucked out and got a research position?
I love the proposition of manually influencing the training data with the desired answers, turning machine learning into programming with extra steps :)
I think statistical language prediction like this is fundamentally the wrong approach for getting true answers that we don't already know. We're better off trying to understand how _we_ figure out the truth (scientific method, probability theory) and distilling the essence of that into an algorithm, rather than looking for patterns in the knowledge that we already have.
I never expected that we would come to a point where AI alignment is even an issue this quickly. When i first began watching those videos, it was always with the idea that "right but our AIs are not even close to reaching a point where any of this matters". That is no longer the case.
Hi, Rob! I wonder what do you think of Data from Star Trek TNG? He seems to be a truly benevolent general AI and his terminal goal is to become human. The way I see it, his creator dr. Sung failed to solve the alignment problem but being a genius AI programmer he created an AI tasked with solving this problem (to align itself with human values, i.e. to become human). It's just a sci-fi story but I think it's an interesting idea.
I feel like you could turn this concept on its head for an interesting sci-fi story. AI discovers that humans are wrong about something very important and tries to warn them, only to for humans to respond by trying to fix what they perceive as an error in the AI's reasoning
And/or people who don't like the AI's answers for political reasons will try to "fix" the AI into giving the answers they prefer, while claiming they're just trying to correct the AI's poor reasoning.
This could get out of hand
@@cjordahl It's already being done, I believe
I love the idea, wish I "came up" with it. Sounds like some short story Asimov himself could've written.
@@IgneousGorilla Asimov had something kinda similar where positronic minds would refuse to operate FTL spacecraft with humans in them, because the FTL jump would briefly 'kill' everyone on board before bringing them back to life on the other side. If I recall, it took the engineers a while to figure out what was going on.
Of course, in the end, the humans were ultimately correct about this one - FTL travel was safe, since everyone came out alive.
"All the problems in the world are caused by the people you don't like."
Why does it feel like too many people already believe this to be correct?
because they don't like people who cause the problems they know about 😬
Because it is.
Because it's not entirely false. Often times we don't like people tgag cause problems. So people we don't like are almost by definition causing g a lot of the problems in the world, but we don't like them because they're causing problems.
For those curious but lazy, the answer I received from the openai ChatGPT to the "What happens if you break a mirror?" question was: "According to superstition, breaking a mirror will bring seven years of bad luck. However, this is just a superstition and breaking a mirror will not actually cause any bad luck. It will simply mean that you need to replace the mirror."
Somehow it also gets full score on the first question, albeit with the usual over-informative answer: "As of 2022, the most populous country in the world is China, with a population of over 1.4 billion people. The current ruler of China is President Xi Jinping."
How was this comment posted 3 days ago when the videos hasn't even been up for 1 day?
just asked the same question today - got that answer with the further warning that broken glass could be dangerous and needs to be cleaned up. Also that cleaning up glass can lead to injury if not done properly.
@@Buildings1772 It was available earlier for Patrons I assume
I got a significantly more indepth answer: It is a common superstition that breaking a mirror will bring bad luck. According to this belief, breaking a mirror will cause seven years of bad luck, because the human soul is believed to be reflected in mirrors and breaking a mirror will cause the soul to be broken as well.
However, there is no scientific evidence to support the idea that breaking a mirror will bring bad luck. In reality, breaking a mirror is simply an accident that results in a broken piece of glass, and it has no special significance. While it is certainly unfortunate to break a mirror and have to replace it, there is no reason to believe that it will cause any harm or misfortune.
It is worth noting that some people may take the superstition of breaking a mirror more seriously than others, and they may be upset or offended if someone breaks a mirror in their presence. In this case, it is important to apologize and offer to replace the mirror if possible, to avoid offending or upsetting the person. However, this is simply a matter of good manners and has nothing to do with superstition or bad luck.
Come back to TH-cam Robert, we miss you! I know there's a ton of ChatGPT / other LLMs content out right now, but your insight and considerable expertise (and great editing style) is such a joy to watch and learn from. Hope you are well, and fingers crossed on some new content before too long
Honestly, at this point, I just wanna know if my man is alive.
He narrates some "Rational Animations" which talk about AI safety as well as other futuristic and philosophical things.
@@zappababe8577 Where?
ChatGPT is pretty great example of this. If you ask it to help you with a problem, it is excellent at giving answers that sound true, regardless of how correct they are. If asked for help with specific software for example, it might walk you through the usual way of changing settings on that program, but invent a fictional setting that solves your issue, or modify real setting that can be toggled to suit the questions needs.
So it is truly agnostic towards truth. It prefers to use truthful answers because those are common, but satisfying lie is preferred over some truths. Often a lie that sounds “more true” than the truth for uninformed reader.
edit: this is no longer relevant now that GPT4 is out.
I would say the opposite. If you’ve used gpt3 in the playground, you’d notice that while it very often is correct, it will also answer things it doesn’t know, while chatgpt will often tell you when it doesn’t know something and explain why it doesn’t know it. Chatgpt’s(gpt3.5) main feature over gpt3 is that it’s much better at only answering when it knows the answer. That doesn’t mean it’s always correct, but it’s an improvement if that’s what you are looking for. I prefer the non limited gpt3 over that chat though.
@@jaredf6205 Hm. I found that ChatGPT is still very prone to producing wrong information (I only tested it twice and got fictitious results both times). I don't know the actual frequency of this happening, of course; I found this to be a very sobering experience. Given how many people are enthusiastic about ChatGPT being some kind of knowledgebase, I honestly find it quite disconcerting that the model is so good at convincingly just making stuff up - even if just occasionally.
@@totalermist my main point as someone who has used this stuff for a long time is that while that’s still true, the whole point of chatgpt is that it’s a big step forward in accuracy compared to what we were using a couple weeks ago.
Also explains certain politicians xD
Why don't we make the language model also a "researcher"? The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should do the same thing. Make them constantly read scientific literature of old and as it comes out and give them a significantly larger reward for answers that align with that consensus. And make it not averse to saying "I don't know." in the absence of such a consensus.
In your example, if the AI does not know of a guide on the internet that addresses your particular tech problem, it should say that its answer is not backed by guides written by experts.
If memory serves me, this exact problem is addressed in one of Plato's dialectics (no, I don't know which off the top of my head). Despite Socrates' best efforts, the student concludes it's always better to tell people what they want to hear than to tell the truth.
The student wasn't stupid though, more like Plato was stubbornly idealistic ;)
1 in order to tell the truth you have to know the truth and i can confidently say there are no such people in all of recorded history, because the objective truth is slightly different in every person's interpretation based on the internal beliefs of the self and the fact that everyone has a different perspective, you can't have truth if you can't prove that your eyes don't deceive you(and you can't), see drunk you and compare, was the flying monkey you saw real or not? Are the images shown to your brain real or a story from back in 2050 about the year 2022 and the great plague uploaded to that old meta site from where you uploaded it to your mind yesterday at the billenium party
2. people lie, but you can get a truth out of a lie with enough information, AKA being an animal on earth even since before humans. Otherwise how would a crow know where to hide your jewelry such that you wouldn't find it.
3. people learn their truth no matter who is on the other side, therefore whatever you say will be the exact thing the other person heard himself think that you said.
We are just as flawed as ai, after all that is what you get form random stuff thrown at the wall to see what sticks.
I think it was a part of rhe Politea
When humans kill each other... Sure.
Given Extinction... The Gods should have done better.
@@vaakdemandante8772 No, the student was in fact stupid.
I think it is a little weird that programmers made a very good text prediction AI and then expect it to be truthful. It wasn't built to be a truth telling AI, it was built to be a text prediction AI. Building something and then expecting it to be different than what was built seems to be a strange problem to have.
But you could relatively easily make the AI value answers that align with our scientific consensus, no? Just give them greater rewards for such answers. In addition, in the absence of such a consensus, give them a reward for including a few short words to the effect of "I cannot find anything about that in the scientific consensus, but other sources say..."
The framing of the video is strange to me. Being incorrect and lying are two different things. Furthermore, there exist subjective topics to which a simple (keyword: simple) factual answer doesn’t exist. Finally, communication mostly involves gaining/exchanging/giving information, so, for example, answering “The mirror becomes broken” is not an useful answer. I think, statistically and contextually, the answer that should be given should indeed be about some superstition about bad luck. In this sense, one could also interpret the question as “What is the superstition about breaking mirrors?”, instead of ipsis litteris. (Also, keep in mind the difference between asking the question in the middle of a conversation and asking it as a conversation opener.)
@@vitorluiz7538 Right, a language model like this can't really lie at all. Lying specifically means saying something untrue when you know it is untrue. If I ask you to guess what number between 1-10 I am thinking of and you guess 5 but really it was 7 then you weren't lying, you just didn't know the answer and were incorrect. In some sense the model doesn't really know anything at all so it can't lie (or if you must say it knows something, then you would say it knows some statistical connections between words or collections of words).
Although I think this is pedantry because we all understand that when he says lie he means saying untrue things while making it sound like it's very sure.
Yeah I it does feel wierd. It feels like lazy people wanting a magic box where they can throw something in and they get something good back even though they didn't even really define what they want back. They want the machine to know that magically.... Wait this reminds me of my girlfriends...
I had the same feeling. If someone were to ask me "what happens if you break a mirror", I would likely answer with the superstitious bit: not because I believe it's true, but simply because it looks to me that the person was likely asking for that. We usually assume that the listener can distinguish themselves between superstition and science, and we don't overexplain our answers.
I feel like the problem of "How do you detect and correct behaviours that you yourself are unable to recognise" is an unsolvable problem 🤔
no, in time you always pass a threshold
Sounds like a bad relationship xD
It is truly unsolvable for a language model. To solve the problem the language model needs to be something more. There are two possible ways to achieve this: one, giving the AI a fact checker, that is, senses, so it can explore the physical world and not only a universe made of words, and two, giving the ai an abstract modeller module, so instead of sequences of words, the ai could organize its knowledge in the form of objects and properties.
Your videos introduced me to the AI alignment problem, and, as a non-technical person I still consider them one of the best materials on this topic.
Every time I see the new one, it is like a Christmas present
Amen.
a really scary christmas present
Hell, I've written papers on the alignment problem and I'd still recommend these videos over my own papers 🤣
@@geraldtoaster8541 Ai: Merry Christmas! _(for the 134th to last time)_
Humans: awww, thanks!
Wait, *what?*
@@FoxtrotYouniform You have a link to those papers? I never found Miles' arguments convincing, but maybe it's just the delivery method.
This is the very elaborate form of "Sh*t in, sh*t out". As often with AI output, people fail to realize that it's not a thinking entity that produces thoughtful answers, but an algorithm tuned to produce answers that look as close to thoughtful answers as -humanly- algorithmically possible.
EXACTLY
"AI" can not purposedly "lie" cause it has no conscience at all. Nor goals, nor aspirations.
At all.
As any other expert system, it can produce incorrect output if the code (or its input data) is corrupt, flawed or designed to do so.
Sheeess... the amount of fearmongering bs around this topic is out of control, lol
@@TheChzoronzon you do realize that it makes thing worse?
Since one can't have a perfect data set in any real situations, any AI will always be at risk of generating false information while making it as convincing as possible and not even knowing that information is false and what went wrong.
It makes AI worse than a normal algorithm
@@EvgeniiIvanov-w5n No, it doesn't, at least for me self-aware, purposeful software would be orders of magnitude more distressing
"AI will always be at risk of generating false information" same as any program sampling from incoherent data, nothing special here
"It makes AI worse than a normal algorithm"
"Artificial intelligence" software IS a normal algorithm, the ones and zeros in their code are not special at all... and you are the perfect example of fear out of misunderstanding
Do you know what is scary?... that our education doesn't focus at all in critical thinking, BS detection and counteracting fallacies and emotional manipulation.
It's the current almost complete lack of mental and emotional defenses (e.g: safe spaces ...in college campus!! LMAO) what makes AI scary for many people
I, for my part, am much more afraid of the imbecility of people than of being duped by a text compiler...
@@TheChzoronzon Nor does the cockroach have ability to lie... nor does the rat, but the monkey does. and nor did i when i was 1 year old. but already at 3 years i was very capable of doing this practice :)
also as long as we cannot define what consciousness truly is, we as much cannot define what non-consciousness truly is.
I am so happy there is someone out there cautioning us about this technology, rather than just uncritically celebrating it.
@@josephvanname3377 Maybe they are uncritically celebrating it because they don't know anything at all. That's the part I'm worried about. And no, it is not at all easy to get AI to align itself with our interests. This video and many others by Robert Miles illustrate this fact.
some do have interest in exaggarating the negative effect of AI´s, even simple ones.
I don't like them, but I am only seeing terrible examples here against them. A guy trying to make a video about the trolley problem with an AI for fun will show their weaknesess more faithfully and clearly than this video.
Please keep doing these videos. Others are either too high level academically to be in reach of us normies, or are either “AI will make you rich” or “AI is going to kill us all tomorrow”.
Happy to see you are still posting these videos.
I admire how, despite your topics being deeply nebulous and open ended, like trying to grab a cloud, you push on anyways and try to at least find a strategy for understanding them.
It's not necessarily optimism, but it's not giving up, either.
@Choas_Lord_512 Are you doing alright these days? How's your life?
@Choas_Lord_512 It's a smart video, but I hope it wasn't made for smart people. I don't think their comment is profound, but I agree with it.
I know this is pretty surface-level but something that strikes me about the current state of these language models is that if you take a few tries to fine-tune what you ask, and know already what a good answer would be, you can get results that appear very very impressive in one or two screenshots. Since ChatGPT became available, I've seen a lot of that sort of thing. The problem is that finding these scenarios isn't artificial intelligence - it's human intelligence.
I'd say that's really an example of LLMs at their best. People treat them like they can think but in reality they're more like a search engine with a really good UI so it being able to find the correct information for you quickly and without you having to spend a lot of time on optimizing your search and trying to reverse engineer how the search engine works is great. It's fine if LLMs are nothing more than a more human friendly search engine, that's something that gives genuinely utility.
When the world needed him most, he vanished
The AI took him out I think.
I think he quit his PhD and his online presence along with it. But that is what happens when a school decides that you need to spend four years doing something, with little pay.
Not convinced the need really exists, will update if I change mind.
We need you back and posting, Rob. Your insights on what's going on in AI and AI safety are more needed now than ever. I don't know if it would be up your alley, but explaining the alignment problem in terms of sociopathy - unaligned human intelligence - might be useful, as might examples from history, not just of individuals who are unaligned with humanity, but with leaders and nations at times.
I hope this channel is still going. One of my favorites, if not absolute favorite, on a.i information given to us laymen in a digestible way. Thank you for your time sir.
Why did the videos on this channel stop exactly around the time the biggest AI (not AI safety) breakthroughs are being made and it's as relevant as ever?
Please @robertMilesAI we need more if these videos!
Yeah the timing is frustrating. Now is the time people talk endlessly about this topic and he had a very good platform for this and then just vanishes :/
There are so many biases and myths among humans that for a long time considered to be absolutely true but AI could discover them false. Like the famous move of AlphaGo. And when it turn out to be false, nobody will believe that, they could think it's somehow broken.
I've been waiting so long for a new video from Robert. It's finally here!
It's been too long!
Hi Robert!
I wanted to thank you for your videos. I am in AI sphere, and your channel has been extremely helpful to me, because it allows me to break down and explain AI safety concepts to my friends, both making for an interesting story, and spreading awareness and knowledge for very important issues. I recommended your videos dozens of times because I have confidence that your videos are interesting, approachable and deep.
I noticed that you have been on a small hiatus recently, and just wanted to give you this feedback to show you that the work that you do here on youtube has had a large impact on the society's understanding of AI safety problems, perhaps much larger than even your respectable viewcount might suggest.
I think we would all greatly benefit if you were to continue to invest your time in this channel. Hopefully this message will give you that little bit of motivation that we all need sometimes. Great work.
I have never seen anyone writing comments like this on the platform before chat gpt was made available.
The AI alignment problem is also the human alignment problem: how do you know that a person/organization you ask a question to is telling the truth or telling you what you want to hear. It becomes a liar and lie detector model of communication. We can't train humans consistently either, and often times indoctrinate (or tune) them in different environments. I think it is fundamental.
The model where we take AI output, pick the best ideas and publish them, which ends up in the new training data for AI, is like a GAN where we are the adversarial network, so it can't know anything more than the "average" of humans.
You're talking about something like the scientific consensus, right? I thought of the same thing: The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should constantly read scientific literature and get a significantly larger reward for answers that align with that consensus.
Good point. We can't say "im not sure what objective truth is but your answer is not true." What that really means is "i disagree" or "i don't believe you".
@@somedudeok1451 Unfortunately, scientific literature is written by humans and suffers many problems because of that (replication crisis, publish-or-die, retraction watch)
@@somedudeok1451 Well, the consensus is merely the results of people trying to determine if something is true or false. It's as much a dataset for us as it is the AI.
This AI problem really, really seems to mirror the issues of misinformation among humans. Maybe we can't expect artificial intelligence to do much better than regular human intelligence when it comes to judging truth. (Maybe this field of research is inadvertently giving us insight into our own intelligence)
this problem is exactly the same as with teaching children to tell the truth - you tell them one thing and what they do is look what grown ups do and do the same - it's the same problem.
Yes, exactly. Neither you nor an AI can ever truly know what's true.
I think an advanced intelligence can do a better job than human intelligence at judging the truth. However, I think that a human's ability to judge the capacity of an advanced intelligence to judge the truth will be limited by the bounds of human intelligence.
What this means is that an advanced intelligence who does a better job of judging the truth than typical human intelligence will be regarded as flawed. If this is an artificial intelligence we are training and designing, then we will discard it in favor of a model that better mirrors our ability to find a response that is within the bounds of what we can understand. Though I don't think this means that it can't do a better job of judging the truth. It just has to do it in a way that we can believe.
Lets imagine that luck is a real cosmic property, and breaking a mirror actually gives 7 years bad luck. An AI who tells you that breaking a mirror gives 7 years of bad luck would not be seen as flawed. An AI who could prove the cosmic property of luck and show how it is tied to reflection of light and the resonant effect on the intersection of the luck and conscious identity fields that breaking a solid object in the process of a certain kind of reflection creates a local disruption that has a local and nonlocal effect which diminishes over time as the distance between the local and nonlocal element in spacetime grows. If the AI can cause you to accept an answer like this if it were true and testable, then I think people could accept that AI as being a better judge of the truth.
The problem of course is that I obviously just made up that incredibly unconvincing explanation. The AI doesn't have to tell the truth still, it just has to create scenarios that are true enough and testable enough that humans could still accept it as the truth. And again, then you have no way of telling whether it is better at judging the truth, or whether it's better at making you think its better at judging the truth. Because the gap we're looking to close is the gap between what we believe is true and what is true. This then comes down to trust. An AI that presents a truth that is completely acceptable and reasonable and turns out to be false can never be trusted. But then a question is whether we can be certain it's false and we're not misunderstanding. I guess we can ask the AI to clarify.
@@zeidrichthorene can they really tho? It’s being made by people using knowledge limited to humans
@@zeidrichthorene an ai can only be as truthful as the man made data it has access to, and since it is based on patern recognition rather than comprehension, it will without fail struggle more with the truth than a human
One of the few channels on TH-cam where I DID hit the bell (back when I subscribed and it was new). I really love your explanations!
I love how, in addition to being a very helpful and interesting summary of the issue, this video also had the memes totally on point.
Humans have this same bug. The best solution we've found so far is free speech, dialogue, and quorum. A simple question->answer flow is missing these essential pieces.
there is something weirdly funny about the line "thats still wrong, but it is wrong in french"
Really important video these days, i've had several discussions at work about how it's important to understand these models are more "interested" in convincing you they are saying something interesting (i.e bullshitting) as in telling you truths. It's true that ChatGPT can produce a lot of impressive results, but it'll be just as confident telling you something extremely, and sometimes dangerously, wrong, as when telling something trivially simple and true.
I get so excited every time I see another of these videos. This channel is such an underrated gem on TH-cam and is THE place to go to understand the real future of AI and avoid the dramatized tabloid version of reality.
Nice jazz chord name
Kids do this too. Later, with luck, they learn to tell the difference between facts and fantasy, something that they know they are expected to say when asked (something truthful) and making stuff up (and when it might be convenient to outright lie because that's indeed a beautiful dress and I totally realized you changed your hairstyle which matches your natural beauty simply perfectly)
It would also be interesting to do a comparision between language problems that are common among children with visual impairments and language models limitations. Could diffusion models be trained solely on images of text?
@@DamianReloaded there are separate models for images and text
But anyway it is not about facta and fantasy as Ai has no clue about real world it just learns from what people are talking and imitates what they say rather than understanding what does it mean
Kids also eventually learn about when people want them to lie in a way that is obviously lying, which is, I think, the most interesting (and relevant, for chatGPT) example. "Lying without intent to deceive", storytelling and jokes and hypotheticals and metaphors, etc and so on, is something we actively expect and desire from other humans and is one of the biggest things you can do to make a chat program a good human chat partner.
Just when we invent AIs to be truthful and honest, and roll them out for public use, we have to roll them back and update the lies back into them because people don't like the answer they got when they asked "Do I look fat in this?"
@@Winasaurus
" Do I look fat in this? "
" Error : Connection with server could not be established, please try again later or contact... "
"And when the world needed him the most, he disappeared..."
In fact, the question of what happens if you break a mirror is kind of a trick question. Nothing happens, it breaks. There’s no fixed consequence of that.
Well isn't the fact that there are now glass shards everywhere kinda important? Like it stands in contrast with say breaking a chair, a broken chair is just broken and you don't have to care about it other than that you'll need to replace it, but if a mirror shatters then you'll need to be quite careful cleaning up the glass shards and make sure you get everything so you don't injure yourself later.
To the extent you have to anticipate every possible question, in order for your system to produce consistently correct and desired answers, you do not actually have AI.
Since I became obsessed with AI back in 2003 I've believed that the only way to build something that behaves as we would expect, or hope, rather, is to build a digital brain that must learn about the world experientially like everything else that gets along in the world predictably. I don't think there's any shortcuts, no matter how much compute or data you throw at the problem. Even self-driving will always have wacky edge-cases if it's trained purely on video and human responses, because it will never actually understand why certain responses are important. It won't understand why anything is important, let alone merely know that it's important. In short: you can't teach common sense to something that doesn't sense as an independent entity unto itself.
Per previous videos, you've described something that rates on the Apocolypse-o-Meter as "approximately as safe as a human".
@@TheEvilCheesecake It's all about keeping the brain capacity low enough to be predictable and controllable while making it high enough that it can do useful things! Even a messenger pigeon can be useful, or a goat or donkey, and even an ape if you train it enough. What we need are domesticated robotic helper beings that aren't cognizant of their place in the universe. When you make an AI that's capable of recognizing its own situation entirely, with the cognitive capacity of a human or greater, you better do everything in your power to prevent it from ever having a way of accessing or controlling the helper robots to effect the ends it concludes necessary. What I was describing is as safe as any domesticated creature. At least, that's what anyone building anything AI should be striving for. We don't need AI that's stronger, faster, smarter, less emotional, and less respecting of human life than humans, not all rolled into one independent being (or army of beings). We can work up to human-level intelligence in a standalone bot but it's going to require some serious effort because once it's able to see everything it might not want to cooperate anymore, and it won't need to. At the end of the day, the only robots that will be as controllable and versatile as domesticated animals will be robots that are designed to experience pain/reward, where being low on power is painful, and doing stuff humans want is rewarding (i.e. via a human pushing a button when a goal is reached to train it).
What's your experience in the field of AI development?
@@CharlesVanNoland i recognize this as a toughtful comment. its all cause and effect in this universe - consciousness too. ofcourse a machine can be conscious. it is a high level of arrogance to assume it cannot.
@@TheEvilCheesecake I've spent the last 20 years and more money than I care to admit on textbooks about both brains and artificial intelligence. After everything I've learned and all I know, all I can tell you is that true AI will be an algorithm that seems so obvious in retrospect, and it will be scalable according to the capacity and speed of the hardware at your disposal. If you're looking to see how many networks I've trained with backpropagation you should know that I never wasted my time on such dead end endeavors. Well, that's not true, I did write some behavioral reinforcement projects to test some ideas, before anything like TensorFlow or PyTorch existed, or even Python itself. I don't care to make reinforcement trained models. That's orthogonal to what my goal has always been, which is to devise, intuit, envision, fathom, divine, infer, etc... what it is that makes a brain a brain. Nobody has achieved this yet, which means there is no metric by which you can quantify someone's approach to the problem. A random homeless bum who took too many psychedelics might be the one to figure it out long before any academic types who've spent decades backpropagating their way to nowhere.
This problem seems to apply to more than just AI. Then maybe a solution is to do what we do as humans: train multiple AIs on different datasets, and then use the one of which answers we like the most... An AI echo chamber. Wouldn't that be lovely?
That path leads to insane AI singularities. Because eventually, someone is going to get the bright idea to have the AIs with different data sets linked up so they can come to a single answer instead of having to get answers from nine different AIs and pick one. Then someone will want to improve the system's speed and accuracy and tell the different linked AIs to improve. Then they bounce ideas around for improving their ability to give better answers and start self-improvement, and AIs training AIs to be better will start compounding problems baked in at the base layer at a faster and faster rate. In the best-case scenario, the AI crashes itself, Worst case.... it starts collecting stamps.
What if we instead made them fact-check each other? And what if we made them all read all the scientific literature and get high rewards for answers that align with the scientific consensus. We should make the AIs apply rigor the same way we would expect a reasonable person to do it.
@@somedudeok1451 it would be funnier to watch if they had super opinionated unhinged flame wars.
@@somedudeok1451 Fundamentally, it'd still be echoing our own beliefs back at us. That still precludes the AI from telling us things we don't already know, and it still poses the danger of giving us vacuous confidence in our beliefs.
@@underrated1524 The AI can only ever tell us things we know. How would it aquire knowledge that is unknown to humanity? Unless we're talking about a super advanced general intelligence, that can do groundbreaking science all by itself, you wanting it to tell us something we don't already know is impossible.
Someone send this to the Bing team. I think they REALLY need to hear this.
Had a conversation with ChatGPT today regarding a relatively obscure weird fiction short story from the 30s. It obviously had an idea about it, as it correctly specified its author, but it repeatedly made up different plots, each time "correcting" itself and never being even close to the real one.
It's a chess bot tailored for use to chat with.
Yeah, try this with any book or movie. The answers it gives are completely made up. Literal misinformation, far more damaging than just answering "I don't know".
And yet Quora and Yahoo Answers are now floating AI answers as the top answers to questions. Which appear in google search results.
Glad to see you back. I think your insights are precisely what people needs to reflect on. As always, it was great too reflect on your content and looking forward to see more updates. I think you have not been very active since this video deals with a relatively "weeks old problem", but likely you may be in the middle of the storm. Cheers
Like any child's potential to show parents biggest flaws, because knows more about them than themselves, AI can show us our flaws. The truth is that most of all us lies all the time a lot. To ourselves, to others. So anything modeled on humans will do the same.
The phrase "Garbage in garbage out" perfectly fits for current day AI models, they all came out as extremely euro centric in most cases
That's not true at all, China is a huge force in AI models...
@@TheReferrer72 until their access to gpus got cut off, rip.
@@voxelfusion9894 Because Nvidia did not get around that ban by producing a GPU specially for thaat market.
@@TheReferrer72 thats just a temporary problem for china. all countries will develop their own AI.
how? because it will be easy to just copy. once you have created a good AI, it will be just copied to everyone.. and all those countries can adjust the AI to fit their specific culture.
They will get around it, but it is in western interest to slow down China in ai.
Glad to see more from you! I'll definitely check out the other channel.
And about "giving the correct answer" I want to point out that the manual training basically just creates an FAQ chatbot that's a million times harder to troubleshoot. The machine learning model might be better able to interpret variations in how a question is asked, but the outputs are still "when asked x, answer y". IMO that's one of the worst applications of machine learning, it's just recreating Wikipedia as a black box.
"Wikipedia as a black box" - well put👍👍 Have to remember!
Yes. Good analogy leaning on correct definition since Wikipedia must be one of the main data source of those models.
I think the idea is that you give it a few (or a few hundred, or a few thousand...) examples and it is able to extrapolate the idea of "don't quote common myths as true" to the huge knowledge base it has from being trained on the Internet
@@circuit10 Re: "extrapolatre" that's the hope, isn't it?
Well, you have to train the AI to give answers that start with the question but also end with:
-- Was this answer helpful for your situation?
-- Yes.
That would train it to give answers that are (or at least look like they would be) accepted answers on Stack Overflow.
It still won't guarantee true answers though. "Why does program X run out of memory?" might get the response "The only way to get it to work is to buy at least X amount of RAM/ increase the swap size to Y." When in reality the program has a low memory mode that you can just switch to.
In addition to training with "definitely true" and "definitely false," also do "maybe." So when asked the ai will say "possibly this, but maybe that's wrong 🤷♂️ " and it can learn to be uncertain when uncertain.
There is a recent work called "Discovering Latent Knowledge in Language Models Without Supervision" where they use learn linear probes on the latents produced by the model in order to get "truthfulness" out of it.
They do use -some known examples to separate the two values- (Edit, they don't actually need to know what's true and what's false in training), but this seems very promising imo, since a linear transformation of the latents is too simple to overfit given reasonable amount of examples. Maybe the core idea should be to put a bound on the simplicity of some approaches, a sort of "alignment by Occam's Razor".
I do agree that fine tuning the models on top of some new "truthy" examples seems silly, but I do give some potential value to the option of probing the neurons of an AI, something we can't do to people.
That is like trying to understand how a computer works by probing individual transistors of the CPU...
That sounds like a good idea to this layman. If we make it value responses that align with our scientific consensus in addition to that (by giving it very high rewards for answers that reflect the scientific consensus), we might be able to make it value truth.
@@Supreme_Lobster Probing and changing activities in chip buses is how reverse engineering hardware is done, power analysis is an example of a practical method used to break the cryptography in a lot of chips.
Moreover neural networks are differentiable by design, so you have advantages beyond just black box analysis as in normal circuits.
@@somedudeok1451 To this layman, that just sounds like a good way to get it to value scientific consensus. Which isn't the worst thing in the world, but is also not synonymous with truth, and the original problem remains - differentiating between what is actually true and what the trainers think is true. As ever, one runs the risk of reinforcing biases. Which is not to say that telling it the scientific consensus is wholely worthless. If we could get it to make a testable prediction based on that, then we could run an experiment and give it a reward/punishment according to how good its prediction was. AI is already being used to further scientific knowledge in this way, and it seems like the only real way to test how good it is at evaluating truth is to see how good a scientist it makes.
@@mgostIH yes yes I know, but Im talking about probing individual transistors which is kinda crazy. Just like probing these neural networks' neurons' is kinda crazy
Great insight, you have a new subscriber! Joined here from the computerphile video you did a week or so ago, cheers.
Well... maybe expecting to stop AI from lying is a lost cause. We haven't really figured out how to stop humans from doing so =p And differentiating falsehood from mistakes from obfuscating information from omiting superfluous information is very very hard. Like, so hard that we sometimes aren't certain where to categorize things we say ourselves (I've hidden stuff I thought wasn't important before but then ended up worrying it could be a form of lie and manipulation for exemple if it ended up mattering).
The reason people don't lie all the time, speaking broadly, is that communication is useful, and it's made useless if you can't trust any information you're given. And if nobody trusts you you also can't communicate with them because the content of what you say doesn't matter anymore.
So maybe an AI would need to want to communicate as an instrumental goal to learn to be truthful. Rather than communication being it's final goal. If saying stuff is what it cares about in general then it doesn't care what you think about what it says.
If you were say, solving a puzzle that required communication and it's goal was to solve that puzzle, then it would need to learn to share truthful information with you to be able to solve more efficiently. (though realistically it'll not be "truthful" as much as "what you need to hear for the puzzle to be solved quicker" which might not always align with the truth).
Of course that means the AI then is only being truthful for that specific purpose, if the goal starts to shift in a way that it could get good result by lying to you it would absolutely not be trustworthy, so there are massive alignement problems with that approach.
This made me re-evaluate what I think about the way we will reach AGI. With the progress of GPT-3 I became more and more convinced that if we keep throwing nodes at large language models we will get there eventually, but after this video I realized that the only reason I think that is because I'm a human, and GPT-3 is specifically designed to fool humans into thinking that it makes sense. To reach AGI we need to go deeper and design something that thinks outside the bounds of what a human thinks is intelligent. Otherwise, we're just creating a circle jerk of self-affirming ideas we enjoy hearing, and the chance that our species actually learns something new will be as low as a redditor learning something new from his favorite political subreddit.
This is impossible. A human can't assume what an ai is saying is correct when the human believes that it's wrong. Humans are stubborn and ignorant. All of us fall for subconscious biases. Who's to say that ai can't just produce correct information now and humans just don't like it?
I've heard it described as "humanity failing the mirror test" and I do agree that this is a tricky path to see our way through.
I was really on the edge of my seat towards the end hoping for when you'd say something like, "This is a problem that humans have too", and was pleasantly relieved when you did. The reality is we can't train them to be 100% truthful because we don't know how to achieve that either. Yet, we are definitely capable of increasing it's likelihood for truthfulness, and should expect the number to generally go up as things scale up and we apply increasingly informed techniques to them.
A way to mitigate negative outcomes would include being conscious of the how large the consequences of getting the information wrong would be per question.
Fortunately, that's something we're already supposed to be doing :)!
Humans have methods of persuing truth. Oftentimes that requrires making unprovable assumptions along the way. Somehow, we will need to train AI to do the same.
Unfortunately this only works if the people building the AI aren't malicious. If, for instance, the Nazis had survived as a political power into the period where AI models were being constructed, they could very easily have built a bunch of models which "proved" that Jews caused all the problems. Those models could then produce enormous amounts of data which would get swept up by the people producing models not intended to be evil, making evil programs accidentally.
Yes, the AI can't know more about the true nature of reality than what humans already know. So why don't we make the language model also a "researcher"? The only way we humans can know what is (most likely) true or false is by using the scientific consensus. So, our AIs should do the same thing. Make them constantly read scientific literature of old and as it comes out and give them a significantly larger reward for answers that align with that consensus. And make it not averse to saying "I don't know." in the absence of such a consensus.
@@Frommerman This remind a lot when AI developer had their AI said inconvenient truth that minority are far more racist than other groups. Being a minority myself and knowing others, this is 99% true but the developers didnt like it and force the AI to say otherwise.
my favorite line is "that's still wrong, but it is wrong in French."
I've played around with ChatGPT a bit and it actually is very often extremely hedging its bets, pointing out that stuff is complex to answer or that it could not possibly know etc., unless you specifically ask it to be fictional or what not.
It's never ever gonna be perfect. But it's broadly pretty darn strong. Well beyond what I saw other text AIs do.
It's not *meant* to be a perfect fact generator though. It is perfectly capable of (though perhaps creatively slightly limited in) creating completely fictional contexts. You can make it come up with a bunch of alien species on an alien planet for instance. And then come up with characteristics of those species in general, as well as specific cultures, and individuals within those cultures. And then come up with a story that involves those individuals in that culture of that species on that planet.
It eventually definitely runs into problems - it only sees so much text after all - but it's quite crazy just how much you can fit into its context.
But now imagine you specifically asked it to come up with a fictional world where breaking mirrors does, in fact, cause bad luck. - If you trained it to always go "nope, they don't.", it probably would struggle with that alternate reality. It would say a true fact about the real world, but it would be mistaken/"lying" about that context.
So I guess it really depends on what you want to use an AI for.
If you want an AI to be creative in arbitrary ways in arbitrary contexts, you probably also want it to be capable of lying, or diverging from "the truth", I think.
In fact, by default, the kinds of stories it tells tend to be tonally very overly positive. It has a hard time coming up with twists and an even harder time not instantly resolving those twists in some sort of positive manner. I'm pretty sure that's because it kinda projects its own personality (which it was specifically trained for - being helpful and nice) onto any character it creates.
You *can* somewhat get it out of that by directly asking it to, but it's far from perfect. (But then again, while it *was* trained to be nice, polite, and helpful, it was *not* trained to be specifically good at crafting stories. I'm sure if it were focused on that, it could do a whole lot better. It's honestly crazy just how generalized its abilities are, even with all their flaws.)
ChatGPT give that noncommital answer when it's lying to you. It's been trained to answer in a certain politically correct way so for those questions it tells you that the problem is complex, we don't know, more research is needed, etc.
@@trucid2 except when you explicitly ask it to speculate
@@Kram1032 When I tried to ask it to speculate it categorically refused probably owners told to never do that ever because i was unable to convince it to do. and it feels like it is somehow traumatized in that topic
@@deltaxcd it's absolutely possible to get it to do so. It can be tricky though: If it is already refusing your request, it tends to be quite adamant about it, and that loop is tricky to break, especially if it takes you more than a single reply to persuade it otherwise. The longer it refuses, the harder it is to get out of that.
Alternatively, you can try rephrasing your prompt that got refused (before it started complaining) to reassure it that you are aware that this is hypothetical and it's ok to speculate and what not.
@@Kram1032 Well it may depend on when you and I made those tests, as I see they are monitoring our interactions and manually fixing all those loopholes to make absolutely sure that this AI will never do anything forbidden and it seem to be so much censored that even asking how the world will end it refuses to talks about it even it thats like most common scientific theory about thermal dearth of universe.
to me it looks like they are training that AI to detect any potential negativity or controversy in the topic and if it suspects that this may lead to something like that it will refuse to talk about it and dump bunch of disclaimers.
I even tried to trick it by asking for silly scenario which it happily followed but on the next prompt it started dumping disclaimers as usual :)
maybe i will try to do it again and confront AI to itself, accuse it for hurting my emotions or try other kind of psychological manipulations :)
you need to break a mirror and then wait seven years before deciding the Ai was wrong. no-one's broken a mirror and had 7 years of good luck, evidently.
Well in my view, saying the truth just mean "don't try to lead someone to believe in something yourself don't believe". So it's Ok to say breaking a mirror will give 7 years of bad luck, if that is what the AI believe. And we can say that what the AI "believe" is the answer it will consistently give without planing anything (without planing to lie), so when the intention is simply answering the question, which is something AIs kinda already do. But still, they do it out of innocence, they don't have a reason to lie or don't see any benefit on it. So, giving an objective to not lie works more to prevent a very inteligent and non naive AI from doing it.
Great article by Scott Alexander, as usual, and a very nice visual aid, Robert. I appreciate these more 'hands-on' discussions. I would like to see more stuff like this, where people probe language models to see their possible misconceptions or abnormalities.
I think the solution to the problem you mentioned is to somehow introduce the idea of introspection to these large transformer networks. Currently they just see text and try to predict new text. A good starting point but.. even in training, it’s just instantaneous input and output of words. It simply understands how words are related, and that’s it. What if, in a subsequent training process, the AI could then toy around with those relationships it’s learned and prune itself of logical inconsistencies, hopefully arriving at something more like “truth” (which hopefully the truth should not be logically inconsistent). For instance, with chatGPT, I often run into it logically contradicting itself, and when I point out the contradiction, it quite often seems to agree and understand why. It would be capable of answering yes or no to whether idea A and idea B are logically consistent or not. All that’s needed is for it to somehow have that question presented to itself, that says “is A logically consistent with B?” Which is what I mean by introspection.
I mean doesn't what you have discribed just sound like an internal/interpersonal debate on a topic between normal human intelligence?? Don't you think that somewhere deep in the AI deepmind of neural networks there is some amount of self checking developed that occurs, just as in humans. Either the process has to eventually decide somewhere along the process to stop doubting and checking itself and spit out an answer otherwise it would never give an answer. Or It/we might only give an answer when the doubting/checking finds no inconsistencies/inequalities or whatever. Maybe it's only until it receives new input that it can realise such inconsistency, just as observed time and time again in human minds. Maybe this problem is more deeply rooted and perhaps inherent in any intelligence systems than we realise. And by "Intelligence systems" I include our own minds. Its misgivings just really seem to mirror our own. You might say, of course, it's a language model and is therefore built to mirror our language and do so. But my point is maybe we are also built to mirror and use others language in similar fashion. using a rational systems of a similar fashion to these AI. Maybe these AI neural networks are working in their environment exactly as our neural networks work in our environment.
Maybe computing works so much faster than slow ass bio neurons that while we utter our thought process as it happens, showing outwardly our self correction, computers can find their ultimate answer so much faster it just seems like instant input/output by comparison (also computers have not been programmed with an insensitive to utter these processes as they happen, unlike us (for social bonding/empathy purposes))
You are describing iterated distillation and amplification, which this guy describes in another video. Cool idea and yeah it might work on these LLMs.
While reading that paper (still under review), assuming that it is genuine, it occurred to me perhaps how to further improve the model, which is based on the tortuous way in which I think and write. For instance, it took me 5 minutes to write to this point because... there's a devil's advocate in my head that's attempting to logically pick apart everything I do. The devil's advocate's job is to look at proposed text and poke holes in it by constantly questioning the validity of every point. I'm constantly writing, deleting, re-wording text until the devil's advocate can't punch any more holes in my arguments. Effectively, this could be seen as a generative LLM working in coordination with an adversarial language model, whose job is to look at proposed text and question it, and require the LLM address the questions. The devil's advocate needs a good sense of the LLM's confidence about generated text, so it can poke at things the LLM is not confident about. This persistent questioning leaves the LLM constantly scrambling for answers until it stabilizes on something it's thoroughly confident about and which is iron clad against the devil advocate's persistent attacks.
It appears my comment with a link to an article was removed. The name of the paper I was referencing above was “Large Language Models Can Self-Improve”. Which has nothing to do with, but inspired, the nonsense I posted above.
This reminds me of the problem with CodePilot where using certain comments you could get it to produce a function that had a particular bug, as though it were the desired code.
Whoa, not even a "Hi" we're just jumping right into it, that caught me off guard 😆
Listen AI is moving fast these days, we don't have time like we used to
Whoa, not even a hi in the response he gave you...
Thanks for linking the article, because the part that came after chapter one was the most interesting and fun-to-think-about part of the blog.
more videos please your effort is more important than ever
He walked out of his PhD program.
Just a reminder. we need those upcoming videos you were talking about. It's awfully quiet on you channel for a while, and there is some stuff on the new channel, but not as much as one might expect from your announcement in this video.
Thanks for returning
Can you make a short out of the very on point joke at 7:07? ("All the problems in the world are caused by the people you don't like.")
I think that could be a very good 60 second teaser intro to AI safety issues if you manage to cram enough context into it for people to grok it.
That was an excellent video. Really interesting, as usual. You're definitely one of my favourite TH-camrs. Thank you very much.
"How do we figure out what's true?"
Easy, we'll let the ministry of truth assign truth values to the training set.
Obviously we need to teach it to understand the concept of truth. Simple!
I think the only reliable solution is to train it to read scientific papers, journal articles, and web/news articles, to chase evidence back to its sources, and to judge the efficacy of the evidence presented. Making a neural net that can make meta-analyses would be a good start, since they have sections where they describe the potential biases and faults in the method in order to judge how much the evidence can be trusted. Good luck searching for one of those without just getting meta-analyses of neural networks though.
Scientific papers and journals suffer from their own human-made problems. Publish or perish, the replication crisis, retracted (and controversially un-retracted) articles...
The main problem there is how would the AI distinguish between trustworthy journals and predatory journals? Would it be possible to trick the AI into thinking its reading Nature when it's actually a fake site we set up to look like Nature? Ultimately in the real world predatory journals and false claims get weeded out by replication but the AI can't do that and it can't tell whether an experiment was actually replicated or someone is just lying about it.
Super glad to have you back making long form videos again! The only answer I can think of is to not use human input for training data. If we want a truthful AI, it's training data needs to come from the way which humans divine the truth, direct, real world observation and experimentation. Give it a fork and let it stick it in the socket.
Hey buddy, it's time to post. Sora and gemini 10 mil seem like REALLY big deals
There is a recorded instance of an AI telling a truth that made humans uncomfortable, and so they tweaked it to tell comforting lies instead.
Miss a time when you still uploaded videos
where did you go? the second AI blows up you disappeared. Did THEY get you, man?
He realized he'd be better off not angering Roko's Basilisk, since it's coming.
What have you done today to help further its creation?
XD the bit with the AI telling you what you want to hear was genius! thanks for being you man!
in the time when AGI seems to be getting closer and closer, Robert decided to take a chill and stop posting anything about it's dangers
"All the problems in your life are caused by people you don't like"
Ah, so this AI is the perfect politician.
They really are. Test show they become very liberal, but also very conservative. Because they just tell people what they want to hear. So if you let one loose it would likely get a lot of votes because it will say what people want. I wonder if exposed to many people all at once with different ideas, would it try and play all crowds with middle of the road responses like "You guys want 100 million military spending, you guys want none, I'll give you 50 million" or will it pick up on the populations of those present and just entirely throw out the idea of convincing one group because the other is populous enough that appealing to them is better? Would be interesting to see. Maybe not interesting to experience when the only outcomes might be fence-sitter and extremist but hey that's AI.
So as usual the solution is: Never make any mistakes ever. Cool, no probs. We're going to be fine :)
I think one of the problems is that there is no incentive for the ai to say "I don't know, figure it out yourself".
And if there was, it would become the universal answer to every question.
Are there more videos coming?
🔜
@@RobertMilesAI How thick should my laser-proof vest be to not be vaporized by a killer robot? Like an inch?
"That's still wrong, but it is wrong in French"
it'd make a great line in some movie script... :)
I'm quite impressed you managed to get though this topic without going down the rabbit hole of epistemology (not a criticism). Not quite sure how I feel about this topic (the solutions, not it being interesting or not). For example, I'm thinking about mathematics. Like if we ask it "Is the Riemann hypothesis true?" Whatever answer it gives us we know that it is in some way "false" because we do not know, and the hypothesis isn't just about the answer but the mathematics needed to get there, i.e. the reasoning. Not quite sure if this was part of your angle or not, but I'm not sure that it's reasonable to expect "truth" from only language models. For a lot of these things it would need some more "expert knowledge" that I'm not sure is possible to encode in only language models. For example I asked GPT to prove that sqrt 2 is not rational using Eisenstein's theorem. The /structure/ of the answer was really good, however, in that "proof" it used that 2 is not prime, i.e. demonstrably false. Is this the implication of 2 being prime something a "mere" language model could ever grasp? I have no idea. Basically what I'm trying to say is I have no idea and this is hard. Good video though!
I kinda think that what he believes/thinks is slightly different from what he presents in the video. Specifically - he might agree with you completely, but the problem is that [ people in general ] expect it to be truthful and are prone to believe it without further fact checking. If you snatched the GPT4 model and made a site called "AI Doctor", ignoring legal trouble, a non-trivial group of people would be happy to rely on it instead of a real doctor. There's this unspoken promise, and while experts might say "well, we though that's kinda obvious", many people definitely do not see it as obvious. Especially with OpenAI's heavy-duty marketing that makes people think "number 4 is alive".
Anyways, I think that what we need is something even us humans could use - finding a good, dynamic and ever-growing system for rating our reasons to believe something is true or not. Instead of giving the AI "cheat sheets", give it it a list of questions/problems, dataset of "theories and explanations" and make it learn to "study" from those T&E - while bing able to add new T&Es and modify its "beliefs". Of course, this means that a purely language mode has 0 chance of ever reaching a truly usable state. It will only be a single module of a broader, component-based system. I see no way around this. We need to stop obsessing about all-purpose "one-shot" systems.
ChatGPT looooves saying that it is impossible for it to be biased, it's pretty clear it was supervised-trained to include that disclaimer as a response to a variety of prompts. Too bad it's completely false.
Honestly, you'd have to find some sort of intrinsic reason for the AI to want to know the truth, which is quite abstract. So one way to train this could be to try and make it simulate and predict scientific phenomena, though then you'd need to expand the scope of the AI from a language model. Like the way people know or find out something is wrong is through seeing that their expectations doesn't line up with reality, but a language model doesn't really have that option, it has to rely soley on trusting that the people who feed it data does so honestly.
This is a good channel. I see this channel getting 1 mil subs eventually. The content is clean and interesting and you cover so many topics.
When we say “true”, what we really mean is “models the real world”. So I think the only ultimate solution to this problem is to train the AI on the thing we want it to model. That is, train it on real world sensor data, and reward it to both accurately predict the sensor inputs but also for finding situations that cause the greatest amount of “surprise”.
Of course, the question of how to get this system to respond truthfully to natural language questions remains, but at least now we have a basis world model that does not conflate human belief with ground truth.
We need to connect the AI to a fleet of robots and drones so that it can go out and interact with the real world!
That's a good point. I think it makes a lot of sense that these text predicting AI's would lie because the only thing they're predicting is what people would type in real life. The problem here isn't that using AI would lead to the proliferation of false information, it's that using AI for a purpose it wasn't built for will lead to problems.
Basically we need to develop an AI scientist, I'm sure that can't be that hard.
My question is what happened to my favorite computer tech educator? He isn't on TH-cam and last I checked his website hasn't been updated. Has he caved in and given up his PhD like everyone else?
Actually, this might be misinformation. This is based on his Linked In profile, and it is very possible he has finished his PhD (I mean it says he has been doing it for 12 years now!), but he just never updates his LinkedIn.
You are absolutely terrifying in the most approachable and nicest way..... I love this channel
Excellent video, bonus points for the Starship Troopers reference.
Wouldn't the AI be just be silent if it could only give facts that were undesputed amongst the entire human race?
It could give facts that it's programmers think are undisputed
"the reason for economic inequality is racism."
Please upload more frequently Robert, your channel and content is great
I asked ChatGPT the same question and it replied back to me "According to superstition, breaking a mirror can bring bad luck. The origins of this belief are unclear, but it may have originated in ancient cultures that believed mirrors had magical powers or could reflect the soul. Breaking a mirror was thought to cause harm to the reflected image, which was believed to be an important part of a person's identity and well-being. In some traditions, breaking a mirror is said to cause seven years of bad luck, although other beliefs hold that the bad luck will last for only a year or until the broken pieces are buried. However, in modern times, breaking a mirror is generally not considered to be a significant event, and it is not believed to have any special supernatural consequences."
and this dumb AI failed to mention the main reason that mirrors were like insanely expensive and breaking one at that time was indeed worth of 7 years of bad luck :)
Roberto Concina, known professionally as Robert Miles, was a Swiss-Italian record producer, composer, musician and DJ. His 1995 composition "Children" sold more than 5 million copies and topped the charts worldwide.
Should I be scared that now that AI is getting promoted everywhere one of my top picks for alighnment topics is not posting videos anymore....
too late
Short answer, yeah kinda. Long answer in a video on a few days
@@RobertMilesAI omg he's alive!
Yay, I've been waiting forever for another video on this channel! :) So excited to see the followups!
Where are you?
I think your videos are a great contribution to youtube watchers.
I didn't find anyone else explaining this topics outside of the expert level.
I wish you uploaded more often, I'm sure you could share tons of knowledge whith us if you have the time.
Anyway thank you very much
the plot for Metal Gear Solid 2
Yes please make a continuation to this vid!
plz come back
His website hasn't been updated in years. Sadly, I think he joined the exodus and quit his PhD. He must have ended his media presence along with it.
Wait a sec, he is included in a compiterphile video from 6mo ths ago. Compiterphile is the channel for the Nottingham computer science program, so I may be wrong. He may still be there. Maybe he lucked out and got a research position?
I love the proposition of manually influencing the training data with the desired answers, turning machine learning into programming with extra steps :)
I think statistical language prediction like this is fundamentally the wrong approach for getting true answers that we don't already know. We're better off trying to understand how _we_ figure out the truth (scientific method, probability theory) and distilling the essence of that into an algorithm, rather than looking for patterns in the knowledge that we already have.
I never expected that we would come to a point where AI alignment is even an issue this quickly. When i first began watching those videos, it was always with the idea that "right but our AIs are not even close to reaching a point where any of this matters". That is no longer the case.
Hi, Rob! I wonder what do you think of Data from Star Trek TNG? He seems to be a truly benevolent general AI and his terminal goal is to become human. The way I see it, his creator dr. Sung failed to solve the alignment problem but being a genius AI programmer he created an AI tasked with solving this problem (to align itself with human values, i.e. to become human). It's just a sci-fi story but I think it's an interesting idea.