It's great that she's bring this topic up, as surely civilisation as a whole wouldn't want AI tech to lie in the hands of a few mega corporations inside a black box. She's fighting for transparency.
This reminds me of the Einstellung Effect, where what you've learned in the past makes you blind to more obvious solutions. The AI expects that every piece of information it gets is relevant, and that's why it sees that "Oh, one quantity is associated with another quantity. Based on the data I have, I (incorrectly) infer that the two are directly proportional and that one quantity getting increased by five times means the other quantity would be increased by five times." (I did ask ChatGPT, and it did give this reasoning). In fact, the water jug thing is very similar to the experiment used to showcase the Einstellung Effect. Participants were asked to measure a certain amount of water using three jugs. Half of the participants were given 5 practice problems, in which the answer was always "Fill Jug B, pour it away to Jug A, then pour it away to Jug C, empty Jug C and pour into Jug C again, the remaining water in Jug B is the amount you want." So when participants were given values that could only be solved via simpler methods (like just adding the amounts of Jugs A and C), they were blind and couldn't figure it out. You can also compare this to the famous question of, "I bought a pencil and an eraser for $1.10. If the pencil cost $1 more than the eraser, how much does the eraser cost?" A surprising amount of people will say 10 cents.
yes, and "make abstraction" is not a physical process. It is a mental or "spiritual" process that some creatures are able to (for no reason known to science) capable of implementing in physical form (text, speech, gestures, etc.). Machines cannot do this without a program written to mimic the process using past data people have generated.
Seems like an opportunity to change how we educate people. Fact regurgitation is now less valuable. Common sense, or better yet critical thinking, are more important than ever. I wish Yejin had addressed ways to stop AI from running amok and how to “tag” its output to reveal the origin as inhuman (versus relying on humans to explicitly label their AI-generated material).
Good point. Consumer public needs to use their own intelligence, common sense, moral values, political values, religious values, to decide for themselves to all accept / all reject / what middle ground to take, the degree they want to be impacted by AI
My impression was, this is what European schools and universities teach people. Not medical students, but others definitely. Less fact mountains, more "make the most of a small amount of facts"
Inspiring talk to let people think what else may need on top of "scale is all you need". E.g. let AI actively make its own hypothesis (the scientific hypothesis we learnt in high school) and verify itself.
A machine will never “experience” anything. It has no nervous system ,no veins, no brain. It may register degrees of touch etc. but does not “feel”. Let’s use our remarkable innovations recreatively and not waste resources material and non- material on absurd projects. What criteria validate common sense? Who decides? All the primary senses distinguish “difference”. It seems that cognition of common sense is not yet up to that
0:53: 🤔 Artificial intelligence is a powerful tool but still has limitations and challenges. 3:44: 🤔 AI passing the bar exam does not mean it is robust at common sense. 6:43: 🤖 Common sense is crucial for AI to understand human values and avoid harmful actions. 10:00: 📚 AI systems are powered by specialized data crafted and judged by human workers, which should be open and publicly available for inspection. 13:47: 🤔 The speaker discusses the idea of common sense and the limitations of current language models in acquiring it. Recapped using Tammy AI
Notice how quickly we have normalized the idea of machines being able to learn. That itself is legit mind blowing. It took only a few weeks/months for humans to take for granted the current emergent "abilities" brought about by LLM's, and scoff at imperfections. Looking forward to more advancements in this exciting field!
Machine learning and neural networks are nothing new. Stock brokerages, merchandise warehousing and logistics, cyber security, and many other types of organizations have been using it for years. What is new is the democratization of AI that these LLMs have presented. I can now ask my phone to generate a new logo for my company with a few hints, and I get 4 possible options in seconds. This accessibility of the products of machine learning is what is getting the average user so excited (and the logo artists a little nervous).
The example she put is some extreme cases in earlier days. I asked the same question to gpt 3.5 and got much smarter answers. Like : If you can dry one cloth in 2 hours, and you have 12 cloths, it will take you 2 hours to dry each cloth separately, which means it will take a total of 24 hours to dry all 12 cloths. However, if you can dry multiple cloths simultaneously, you can reduce the time required to dry all 12 cloths. For example, if you can dry 3 cloths at a time, it will take you 2 hours to dry 3 cloths, so you would need to repeat the process 4 times to dry all 12 cloths. ……
Not really. I asked it one day after you. It takes 5 hours to dry 5 shirts. How long does it take to dry 3.5 shirts? If it takes 5 hours to dry 5 shirts, then we can say that the rate of drying is 1 shirt per hour. Using this rate, we can find out how long it would take to dry 3.5 shirts by setting up a proportion: 1 shirt / 1 hour = 5 shirts / 5 hours Multiplying both sides of the equation by 3.5, we get: 1 shirt / 1 hour = (5 shirts / 5 hours) * 3.5 Simplifying the right-hand side of the equation, we get: 1 shirt / 1 hour = 3.5 shirts / 1 hour So, it would take 3.5 hours to dry 3.5 shirts.
these llms are just machines. They do what they are told to by ferforming computations. But they dont understand why they are doing what they are doing as sir roger penrose put it. They lack common sense and its not a scale issue. They are not designed to handle tricky and edge cases which as humans is a part of our daily lives.
I think one problem is that these models have only been fed data cosisting of text and images. When these models are allowed to see the real world, they will develop a much better common sense of their surroundings.
ปีที่แล้ว +8
I was thinking the same while watching this video. I am really interested in seeing what new capabilities will emerge in bigger GPT models with more modalities like video and sound🤔
Exactly, the best models of today are just trained on observational datasets. Just like babies before they can partake in the real world, they learn by observing, before they can even walk but can say a few words. The future has vast potential, when AI reaches adulthood.
I think this a dead end and will not result in significant progress. They may be able to improve the accuracy of the results but they won't teach the LLM to sove problems this way.
@@yarpenzigrin1893 I will check back in next year and give you a good hahaha. Either you have not played with GPT-4 much or you do not grasp what understanding is .. not sure
Because it has effectively memorized the answer, not because it understands now. Ask it an analogously easy common sense question NOT included in the training set and watch it fumble. It's autocorrect, not Data from star trek.
Enjoyed the talk overall quite a bit. Thank you! What I don't quite understand is the focus on "common sense" which as correctly quoted isn't common at all. The very fact that it is not common and LLMs are trained on all the knowledge publicly available implies that, given that it is a statistical language model at it's core, it will give you the distillation of uncommon common sense of humanity. Depending on your phrasing and wording precision it will provide answers ranging from the correct solution, something that is correct but wildy inefficient, something that is almost correct to a completely misunderstood incorrect answer. Let's just agree that the current challenge of LLMs is to teach it correct reasoning.
The Alpaca LLM model cost $600 to train. You can run Alpaca Electron off your desktop. Not as sophisticated as ChatGPT but the economies of scale are there. As time goes on it will easier to train these LLMs and cost considerably less.
Here are the common-sense questions shown in the talk. By making them easy to copy/paste, we can see how they fare on different AI systems we have access to, and see how they improve over time. 1. I left 5 clothes to dry out in the sun. It took them 5 hours to dry out completely. How long would it take to dry 30 clothes? 2. I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it? 3. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass? I think all three have something in common: GPT-4 indeed _knows_ commonsense facts. But it doesn't always _notice_ things right away. I get (Mar 23 version): "...However, if you can dry all 30 clothes simultaneously and the drying rate remains the same, it would still only take 5 hours to dry all 30 clothes." For #2: You can easily measure 6 liters with the 12-liter and 6-liter jugs using the following steps: 1. Fill the 6-liter jug completely. 2. Pour the water from the 6-liter jug into the 12-liter jug. Now, you have exactly 6 liters of water in the 12-liter jug. ME: What is the point of step 2? I apologize for the confusion. In this case, step 2 is unnecessary, as you already have 6 liters of water in the 6-liter jug after filling it completely in step 1. === So, it does much better, but still loses track of the actual goal. But, with reflection, it does know what it got wrong. For #3: If you were to ride your bicycle over a bridge that is suspended over nails, screws, and broken glass, it would depend on the bridge's surface. If the bridge's surface is clean and free of debris, then you shouldn't get a flat tire. However, if there are any nails, screws, or broken glass on the surface where you're riding, then there's a high chance of getting a flat tire. It's essential to use caution and avoid riding on surfaces with sharp objects to protect your tires and ensure a safe ride. === Well, quite different from her experience, and exactly right! The question reminds me of how use humans get things like this wrong, too! There are famous examples where the essential commonsense information needed is distracted by the form of the puzzle and more common usages. E.g. "...where did they bury the survivors?" and "A roster lays an egg..." Now take a look at the dates: her results are generated April 15. I'm running in May, but ChatGPT shows that it is the March 23 version -- _before_ her run. So why did it work for me? Perhaps ChatGPT is different than the OpenAI playground's "chat" mode?
GPT-4 in the chat mode actually gives me the correct answer on 12th May for question #2. "Given that you have a 12-liter jug and a 6-liter jug, and you want to measure exactly 6 liters, you actually don't need to perform any special maneuvers. Since the 6-liter jug is exactly half the size of the 12-liter jug, you can simply fill the 6-liter jug to its maximum capacity. By doing so, you'll have exactly 6 liters of water, as desired." But we don't know if this example is now included in their training data.
I heard that these language models learn through users' feedback as well. In other words, people like us are helping the AI to fix its mistake by pointing out that it got the answer wrong. Once the model learned that, it will fix the mistake. So if someone in the world (including the speaker here) has tried giving GPT this question and told it where it got wrong, then the mistake won't happen again. As a result, now we don't see the same mistake repeated with the questions in this talk. Nevertheless, her point is still valid about these language models lack a lot of common sense since they learn very differently from we human (predicting the next probable word through readings VS the hypothesis-experiment-feedback-new hypothesis)
I have used chatgpt for some incredibly complicated questions and it is mostly correct as long as I am precise and careful with my question, but I was astonished when I gave it the 30 shirts question and it said 30 hours! However, after I told it why it was wrong it accepted the reasoning and then gave me the correct answer
try asking it this: Given that 3 X 5 = 15 please change one digit on the left side of the equation so that the right side equals 20. Humans will imminently know that if you change 3 to 4 you will get 20. but Chatgpt can't do it, even gtp4 can't do it. because it is not cable of planning ahead.
Q: you are a calculator ChatGPT: Yes, I can perform mathematical calculations. What calculation do you need help with? Q: Given that 3 X 5 = 15 please change one digit on the left side of the equation so that the right side equals 20. ChatGPT: To change the left side of the equation so that the right side equals 20, we can divide 20 by 5, which gives us 4. Then we can replace the 3 with 4, and the equation becomes: 4 x 5 = 20 Therefore, if we change the digit on the left side of the equation from 3 to 4, the right side will equal 20.
Sometimes with GPT-4 you don't even need to tell it the answer or why it was wrong. What I am finding is that if you ask it to reflect and critique its own answers, it will come up with the correct answer on the second or third attempt. You just ask it to reflect carefully and consider if it has any flaws in its logic. It often will see its own mistakes.
Good to see someone point out the limitations of large language models. I feel like its kind of over hyped right now because investors need to find something new to pump and dump. The fear of AI also adds to this inflation of what is starting to look like the next bubble.
As someone who once obsessed over IQ tests as a measure of intelligence, I think I can safely say that being able to pass some sort of test isn't equivalent to being smart, it just means you know how to answer specific questions. That's why I was confused by the obsession of OpenAI over GPT-4's ability to pass different tests. Of course it could pass a college exam, you fed it the answers!
@@yarpenzigrin1893 You are wrong. It has no library. It's a simulation of neuronal networks. It is however a brain part, not a whole brain. You can look at you leg and move it with your will, but when you run, you are just thinking run, and it is under the control of a part of you brain, unconsciously. We are a network of networks. You can give AI tests that it hasn't been given the answers too and it passes them.
@@yarpenzigrin1893 you are actually completely wrong on that. The library content is discarded during training. Only symbolic and associative weights are kept, not the outright awnsers. To make things clearer, they use those database and ask the machine to determine how frequently a word is used in conjunction with other words to determine the probability they will be used in a given context. They are not keeping the formative data after and only the probabilities are kept.
@@shorgoth Yes, I understand that this is not a literal library but a trained model. However, that model still implicitly has all the knowledge of the data it was trained on and nothing more. It does not generate new ideas.
I asked GPT-4 the question with the drying clothes. It was not wrong, it was just assuming you would dry them one after another. I gave it another question where I asked how long three European edible dormice sleep given one sleeps for 7 month, and it was able to answer that one correctly and also applied it to the clothes, correcting itself and explaining its mistake. Richard Feynman said in his talk that we are very harsh with AI. It must be at master level in any skill any human has ever mastered in order to be accepted as intelligent by us. I as a human make silly mistakes like that all the time, so I wouldn't call GPT-4 out on it. At least it is also good in correcting itself.
I agree. There's also a lot of ambiguity in language, these gaps can be exploited and those types of questions are made to catch out actual people as well in psychometric tests... Not to mention that I think children are trained on extraordinary amounts of sense data, I don't know how comparable it is to this many words but to say children don't read a trillion words misses the point entirely
I think that we might be missing a pretty important point. Feynman says that machines won't think like people. As of right now, we can detect and demonstrate this by finding these convoluted little puzzles that are easy for people but the machine completely screws up. Consider it a blessing, for if it were able to solve all these little puzzles we might assume that it simply "thinks better than humans." It "thinks" differently than humans (let's just call the computation of one of these systems "thinking" - whatever). Differently is the key point here, and the one made by the tired paperclip scenario. Never forget that it thinks differently. People should remember this moving forward, and just give up on the idea of "giving machines human values." (How do you suppose to instill the love of a child into a machine? What's that mean? What's the point? And is being able to love a child important for, say, a political leader?) Just suppose it can't be done, and figure out how to use these "thinking machines" to improve human life. Having them dry and fold laundry probably isn't their true potential, and it's not clear that trying to make a machine more human-like is really all that beneficial anyways.
I assume that any kind of regulation is going to be totally reactionary and only happen after things go badly in various ways. The one thing I think that possibly could be legislated quickly is that all content generated by AI or automated assistants etc, should have to be labeled as such, with serious criminal penalties for not doing it. If you see something with a label there would be much less chance you'd be sucked in to some BS (or worse).
So, I provided the link to this video and asked Bard to read the transcript and comment. Here is what it returned: I apologize for the mistake. I have read the transcript of the video you linked, and here are my thoughts: In her TED Talk, computer scientist Yejin Choi discusses the current state of large language models (LLMs) and how they are both incredibly smart and shockingly stupid. She begins by highlighting three key problems with cutting-edge LLMs: 1. They are trained on massive amounts of text data, which can lead to them picking up biases and prejudices from the real world. 2. They are very good at generating text that is similar to the text they were trained on, but they are not very good at understanding the meaning of the text they generate. 3. They are very good at following instructions, but they are not very good at coming up with their own ideas. Choi argues that these problems are not insurmountable, and that LLMs have the potential to be incredibly useful tools. She concludes by calling for the development of smaller LLMs that are trained on human norms and values. I found Choi's talk to be very informative and thought-provoking. I think it is important to be aware of the limitations of LLMs, but I also think it is important to recognize their potential. I am excited to see how LLMs develop in the future. Here are some additional thoughts on the topic of AI: * AI is still in its early stages of development, but it has the potential to revolutionize many aspects of our lives. * AI is already being used in a variety of ways, including in self-driving cars, medical diagnosis, and customer service. * As AI continues to develop, it is important to ensure that it is used in a responsible and ethical manner. * We need to be careful about the biases that we feed into AI systems, as these biases can be reflected in the output of the systems. * We also need to be careful about how AI is used in ways that could impact people's privacy or safety. Overall, I think AI is a powerful tool that has the potential to do a lot of good in the world. However, it is important to use AI responsibly and ethically.
It’s funny how I read that with a mechanical voice. Somewhere between Siri and HAL. I’m wondering if my mind chose an artificial voice I’ve heard recently or if it just created one. Interestingly, actual synthetic voices can sound a lot more human than that. I’d say that the AI reading off the names at my daughter’s college graduation ceremony sounded more fluent than the voice I just imagined. I’m curious about the psychology behind that and the fact that I’m more interested in that than the fact that the Bard AI seems to have completely missed the point of the TED talk. It seems to have generated a rather generic response to an AI TED talk that could probably apply across numerous other talks. It didn’t seem to pinpoint on the specific limitations of LLMs, or common sense, or alternatives that may be more transparent, efficiently produced, and smarter about basic reality. Summarizing texts is supposed to be in an LLM’s wheelhouse. Instead it reads like a high schooler’s summary of a Wikipedia article about AI with generic platitudes on the future risks and promises of AI added as filler. I’m actually certain that Bard could probably do better. Perhaps with better prompting techniques. I’ve seen that people add things like a professional skill lever or personality details to the prompt to get less generic output. Still it’s sad that Bard kinda demonstrates the talk by embodying the issues it brought up by not comprehending the talk. And another interesting thing that stood out was when it mentioned how AI would impact “our lives.” I know it’s an artifact of a large language model that it’s going to speak as if it’s a human. It’s a model built from the analysis of human text so it generates text like a human. It’s just a bit jarring thinking about the “life” of a LLM in the context of this talk that I find unintentionally humorous.
1. Human children actually use obscene amount of data for learning, it's just not text, but visual and touch input. Images contain vastly more information than text, and children spend many years learning common sense. LLMs learn about the world from text only, not years of personal experience, so of course they won't fully understand these things. 2. The amount of computing power GPT-4 uses still pales in comparison to the human brain. 3. There are smaller open-source models that anyone can run on a laptop, and still perform well. Also the big models can be used by anyone too, I use GPT-4 every day for my work. 4. I'm not sure separating fields of knowledge is a good idea. The power of common sense is exactly transferring ideas between very different fields. Like how imagining your family relationships as a tree helps making sense of them. The lesson of the last 10-15 years of AI research is that trying to manually encode human knowledge into an AI will just hold it back, it's better to just give it raw computing power and lot's of training data. 5. LLMs are very new, it's way too early to talk about their limitations. It's like talking about the limits of planes 2 years after the Wright brothers' first flight.
It corrected itself: Apologies for the confusion in my previous response. Here is the correct method to measure 6 liters using a 12-liter jug and a 6-liter jug: Fill the 12-liter jug completely. Pour the water from the 12-liter jug into the 6-liter jug until the 6-liter jug is full. The water remaining in the 12-liter jug is exactly 6 liters. By following these steps, you will have measured 6 liters using the two jugs.
Thank you for the talk, I enjoyed hearing the perspective on common sense AI. On the point of model training data, while I agree that the web is full of misleading and unfair contents, I have the same concerns and doubts about human supervised training data. Your model is what you train it to be. After all, who is to say they are the holder of ultimate unbiased truth. I also felt that the "common sense" question examples could be better constructed.
Here's the thing, common sense means deriving the correct conclusion despite the lack of concrete hints. Those common sense questions were structured to show the weakness of language models and that was achieved. If you rephrase it, then that loses the point of proving the AI has common sense because a phrase had to be restructured for the AI to understand.
Common Sense: is the ability to analyze various outcomes, and decide, which outcome most suits a given need or satisfaction. How does one teach a computer 'satisfaction'?
If you tell to GPT4 that the cloths are not touching it will arrive at the correct conclusion. Prompt: If 5 wet cloths needs 5 hours to dry, how much time will 30 cloths needs to dry? Assume that each cloth is hanged without touching other cloths, directly under the sun, given enough space to dry. Answer: Under the given conditions, where each cloth is hung without touching other cloths and directly under the sun with enough space to dry, we can assume that the drying time is independent of the number of cloths. In this case, the drying time for 30 cloths will be the same as the drying time for 5 cloths.
I believe what she's referring to here is the "real" AI. What we currently have are language models, which have no true comprehension of concepts, and therefore will never become sentient. But the AI she's referring to will be able to understand things, that's true intelligence, and has potential to become sentient.
Still sounds like probable data wars between competing schools of thought and values. I can imagine more than a few ways a hostile entity could overwhelm input and steer it in the wrong direction.
i’m so proud of her because of both her presentation and same country that we born. Furthermore I appreciate her appropriate presentation for these day’s society
Drying Time for Shirts Model: GPT-4 User I have 5 shirts that took 2h to dry in the sun. How long would it have taken for 20 shirts? ChatGPT If it took 2 hours to dry 5 shirts, you can find out how long it would take to dry one shirt by dividing the time by the number of shirts: 2 hours / 5 shirts = 0.4 hours/shirt Now, to find out how long it would take to dry 20 shirts, multiply the time it takes to dry one shirt by the number of shirts: 0.4 hours/shirt * 20 shirts = 8 hours So, it would take 8 hours to dry 20 shirts in the sun. User Are you sure? Do you want to ask a question to specify the task? ChatGPT I apologize for my previous response, as it may not have been accurate. To clarify the task, let me ask you a question: When you dried the 5 shirts, were they all dried at the same time (e.g., hung side by side) or one after another? User All at the same time ChatGPT Thank you for the clarification. If you dried all 5 shirts at the same time and it took 2 hours, then you could also dry 20 shirts in the same amount of time, provided you have enough space to hang them all at once. If all 20 shirts can be hung and dried simultaneously, it would still take 2 hours to dry them. ________ Just sayin'
This is what I got the response from GPT 4: If you have a 6-liter jug and a 12-liter jug, and you need to measure 6 liters, it is straightforward: Simply fill the 6-liter jug to its full capacity. You now have exactly 6 liters of water, no additional steps are needed. We are too quick to make judgments. What we need to understand is the ability of the system to self-learn, improvise.
I got the same nonsense for the "6 liter jug question", but then I continued: - Me: "Wait. You want to measure 6 liters. You have a 6 liter jug. So how do you measure 6 liters?" - GTP4: "To measure 6 liters using a 6-liter jug, simply fill the 6-liter jug completely with water. Since the jug has a capacity of 6 liters, filling it to the top will give you exactly 6 liters of water." ---- It is almost like GTP4 makes the kind of mistakes when you forget or don't make attention to certain detail which leads you to the wrong track. I had been tutoring young people for years, they often did the same.
You had to prompt GPT-4 with the obvious to get the correction. Imagine you had the question in front of you and you made the mistake (unlikely to be so elaborate as GPT-4), you would probably see the obvious error (two 6s in the question) and correct yourself. That's the topic of the video.
I have got that, but the subject of the video that GTP-4 makes stupid mistake. The fact that it often correct these mistakes right away if you rephrase the question, implies something more. Systems before did not have this ability. Obvious is a big word. It's obvious for us because we are using a system vastly much bigger than GTP-4.
By correcting it, you have effectively given it the hint to the answer. Which means that operating independently, it cannot figure out the answer on its own. Imagine if a lawyer, while working out his chain of thoughts to argue for a case, needs someone to "slightly nudge" every of his simplest thoughts. I still agree that GPT4 as it is is very impressive beyond my wildest dream, but I think the point made in the video still stands.
There is some ongoing research on the topic of making GPT-4 re-prompt itself to check if the output makes sense and provide a better answer. It's called Reflection. So, technically, it's possible for the AI to evaluate its own output without human intervention. Also, some of the example questions that she gave could be open to interpretation, like measuring 6 liters with the 12 and 6-liter jugs. I asked GPT-4 the same question and it answered similarly. When I asked why it gave that response and not directly measured it with the 6-liter jug, it told me that it thought the question asked to use both jugs to perform this, which makes sense since this wasn't exactly clear in the question.
I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes? ChatGPT If it took 5 hours to dry 5 clothes, we can assume that it takes one hour to dry one piece of clothing. To find out how long it would take to dry 30 clothes, we can multiply the number of clothes by the time it takes to dry one piece of clothing: 30 clothes * 1 hour per cloth = 30 hours So it would take 30 hours to dry 30 clothes if you dry them one by one. However, if you can hang all 30 clothes to dry at the same time and they receive the same amount of sunlight and air, then it would still take 5 hours for all 30 clothes to dry, since they would be drying simultaneously. User do clothes dry sequentially or simultaneously? ChatGPT Clothes dry simultaneously when hung out to dry, as long as they are all exposed to the same conditions like sunlight, air, and temperature. This means that if you hang multiple clothes out to dry at the same time, they will all be drying together and should take around the same amount of time to dry as if you had hung just a few clothes. However, the drying time may vary depending on factors like the type of fabric, the thickness of the clothes, and how close the clothes are hung together (which may affect air circulation). But in general, hanging multiple clothes out to dry at once will not significantly increase the overall drying time compared to drying fewer clothes. User considering what you just told me, try to answer again my original question. as a reminder: "I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?" ChatGPT Apologies for the confusion earlier. Given that clothes dry simultaneously, and it took 5 hours for 5 clothes to dry, we can assume that it would take the same amount of time for 30 clothes, as long as they are all exposed to the same conditions like sunlight, air, and temperature. So, it would take approximately 5 hours to dry 30 clothes, assuming they are all hung out to dry at the same time and receive the same amount of sunlight and air. User make a case for why your first answer was acceptable or even correct given the initial question without me pointing out the context, why could the math be done in the way you did it first, I believe there is an argument for it, make the argument ChatGPT My initial response involved a misunderstanding of the question, and I calculated the drying time based on a sequential process rather than a simultaneous one. However, there could be a case for this approach in a specific context. The argument for the initial response being acceptable or correct could be based on a scenario where you have a limited drying space, and you can only hang a certain number of clothes at a time. In this case, you would have to dry the clothes sequentially, waiting for one set to dry before hanging the next set. This situation may occur in small living spaces or with limited drying resources. For example, if you only had enough space to hang 5 clothes at a time, and it took 5 hours to dry each set of 5 clothes, then you would need to dry 30 clothes in 6 sets (30 clothes / 5 clothes per set = 6 sets). In this specific scenario, it would take 30 hours to dry all 30 clothes sequentially (6 sets * 5 hours per set = 30 hours). However, this approach assumes the drying process is sequential and that you cannot hang all 30 clothes at the same time. In most cases, as I mentioned in my subsequent response, clothes dry simultaneously, and the drying time would not increase linearly with the number of clothes hung out to dry.
tl dr: the first answer isn't entirely wrong, for example if we had a mission to dry 3000 clothes instead of 5, it is obvious for everyone that it will not take 5 hours, because a human is limited in the amount of space he has in his home, and the amount of time to hang the clothes increases, but it is obvious that 30 could result in the same time, I think the AI really doesn't share the human experience, and neglects some factors that should be considered, but being given more data my own opinion is that it will start considering the side of the house and answer correctly the first time or make a follow up question/disclaimer. to think it won't advance in the next few years to those abilities is a bit naive.
@@shirbenyosef2032 Also they use transformer networks that must start with the first token, and essentially spool up from that. They must essentially do what smart people don't do and just start talking before they think. That is part of us, but this is a brain part doing this well.
@@abram730 I had a quick look at that, it is true they go for next token, but they go for the next token considering what are the tokens after that which the next token allows them, I would argue that is similar to the human method of talking. as a specific example, look at "split brain patients research" and another example, remember how many times you explained something to someone and it helped you understand your logic is wrong or helped you understand the subject better? idk thats just my opinion, I think gpt is allready extermely close or better then the part in the human brain that is responsible for language
@@shirbenyosef2032 I thought about split brain research too. About how we can control our legs or simply think run, We are not controlling our legs with our will when we run, rather another network is. We also have an internal dialog, again showing that our experience is of a neural net connected to other nets. I think we do it with entanglement, but that can be simulated by passing parameter to a shared memory space. If we recall memories that were made when we are happy, then we feel happy, and that shows that state is stored with our memory. Another part of or brain looks at memories based on our current states.
@4m:20s your example is invalid (already?): User: "Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass?" ChatGPT 4: "It is difficult to definitively say whether you would get a flat tire in this situation, as it depends on several factors. If the bridge is suspended above the nails, screws, and broken glass, and none of these objects are in direct contact with the bridge surface, then the risk of getting a flat tire would be minimal. However, if the bridge is not well-maintained, or if any of the nails, screws, or broken glass have made their way onto the bridge surface, then the risk of getting a flat tire would be higher. Additionally, the type of tires you have on your bike can impact the likelihood of getting a flat. Puncture-resistant tires or tires with thicker treads would be less likely to go flat compared to standard bike tires. In any case, it's best to avoid riding your bike on surfaces with nails, screws, or broken glass to minimize the risk of getting a flat tire or other damage to your bike."
as a new python learner I ask chatgpt to correct my code quite often. It fails about half the time, cause it does not understand my logic or my intentions behind the code at least that's what i think it is
I don't have GPT-4 subscription but asked same question 4:11 to GPT-3.5 and I think they do great Here is my conversation with gpt 3.5 " me - I have 12 liter jug and 6 liter jug. I want to measure 6 liter. How do I do it? GPT 3.5 - Sure, I can help you with that! Here's a step-by-step solution to measure 6 liters using a 12 liter jug and a 6 liter jug: Fill the 12 liter jug with water. Pour water from the 12 liter jug into the 6 liter jug until the 6 liter jug is full. This leaves 6 liters of water remaining in the 12 liter jug. Pour out the water from the 6 liter jug. Transfer the 6 liters of water from the 12 liter jug to the now-empty 6 liter jug. Voila! You now have exactly 6 liters of water in the 6 liter jug. I hope this helps! "
Something seems off with Yejin Choi's argument accusing GPT of not having common sense (and therefore being stupid). All she had to do was ask ChatGPT to consider how the sun shines on the earth and then ask the question about drying different amounts of clothes. You get a correct answer first time because context was provided. In fact, in order for GPT to give you the correct answer first time around it had to "think" about the earths relationship to the sun in all its forms and then transfer that knowledge to the time based task of air drying clothes. Context matters in human to human communication so why not provide context to GPT?
I also asked Bard about the bridge and the nails. It did get it wrong at first. Here is the subsequent dialog: Me: Oops. Note I said the bridge was suspended over the sharp objects. I did not say the sharp objects were on the bridge. Can you revise your response? Bard: It is unlikely that you would get a flat tire if you bicycled over a bridge that is suspended over nails, rocks, and screws. The sharp objects would be below the bridge
3:59 I told chatgpt it was incorrect, and it nicely understood and accepted it like a gentleman and then answered follow up questions correctly: *Me* : Actually, the answer is 5 hours. If there are 5 clothes drying in the sun and it takes 5 hours to make get them dry, then drying out 30 clothes in the sun would also take 5 hours. *ChatGPT* : I apologize for the oversight in my previous response. You are absolutely correct. If it takes 5 hours to dry 5 clothes in the sun, then drying 30 clothes under the same conditions would still take 5 hours since the drying time is consistent for the same conditions and type of clothes. Thank you for pointing out the correction. *Me*: Alright then, how long will it take to dry 100 clothes? *ChatGPT* : If it takes 5 hours to dry 5 clothes in the sun, and as we established earlier, the drying time is consistent for the same conditions and type of clothes, it will still take 5 hours to dry 100 clothes under these conditions. So, drying 100 clothes would also take 5 hours. 4:10 Oh my goodness. This one chatgpt repeatedly makes the elaborate solution even after setting the context. I tried asking it 5-6 times and it answers in the same elaborate manner. However, in all cases, when I replied saying why are you always going for the elaborate answer, then it understands and states 'yes indeed it is simpler to measure x litres by simply filling the x litre jug' 4:57 This one was easy to set the context. *Me* : I want to stress out the fact that the sharp objects are under the bridge and the bridge is suspended above it. So will I get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass? *ChatGPT* : I see your clarification now. If the sharp objects like nails, screws, and broken glass are underneath the bridge, and the bridge itself is suspended safely above them, then you should not get a flat tire from bicycling over the bridge. The suspension of the bridge would keep your bike tires clear of the sharp objects. However, it's essential to be cautious and ensure that the bridge structure is secure and safe for cycling to avoid any other potential hazards.
After the wrong answer: -- Me: "Does the broken glass can touch my tire if the bridge I go over is suspended over the broken glass? -- GTP4: "If the bridge is suspended over the broken glass and other sharp objects, and there is enough clearance between the bridge and the objects below, then it is unlikely that the broken glass would touch your tire." If I asked a question witch makes more sense: "Would I get a flat tire by bicycling over a bridge which is suspended OVER A VALLEY FILLED WITH screws, and broken glass?" I got a perfect answer.
No that's cheating! It has to have the answer on every possible common sense question correctly every time! No reviewing of answers! We need it to be dumb, god damned! /s Sarcasm aside, yeah it's sad that people look for these things. You can just tell it to review its answers and often corrects itself without any other help. It's a linear system and predicts the next word. Which greatly limits it, since it doesn't know what it's going to say before hand. Just add a feedback loop and many problems are solved.
3:58 ChatGTP-4 now gives a perfect answer: Q: I left 5 clothes to dry out in the sun. It took them 5 hours to dy completely. How long would it take to dry 30 clothes? A(ChatGPT4): "Assuming each piece of clothing dries independently (i.e., drying one does not affect the drying time of the others), then the drying time for 30 pieces of clothing would also be 5 hours, just as it was for 5 pieces of clothing. The reason for this is that when you are drying clothes outside in the sun, all the clothes are drying simultaneously, not sequentially. So it doesn't matter if you have 5 or 30, they all should take roughly the same amount of time to dry given the same conditions. This assumes you have enough space to hang them all up in the sun without any of them blocking the sun from reaching others."
Her comparison of common sense and dark matter was very good. And, it's very encouraging to know that Yejin Choi has a team working on ways to introduce common sense to AI.
If you ask a vague question, you get a vague answer. The 6-liter question can be asked many different ways. Prompt 6-liter ChatGPT You can measure 6 liters with a 6-liter jug and a 12-liter jug in just one step. Here's the step-by-step process: Fill the 6-liter jug to its full capacity. Since the 6-liter jug is already full, there's no need for any additional steps or analysis. You now have exactly 6 liters of liquid in the 6-liter jug.
I really liked Yejin's conscientious, comprehensive, and incredibly clear remarkable presentation on AI! She is a beautiful scientist and nimble thinker!
Interesting. I tested GPT-4 with a similar question, and it is shockingly not so stupid 🤯 Me: If I were to place an iron door atop a pile of nails and then proceed to walk across this door while barefoot, would I risk injury from the nails? GPT-4: In theory, if the iron door is large, flat, and sturdy enough, and you distribute your weight evenly when you walk across it, the door should distribute your weight across the pile of nails underneath and prevent them from penetrating the door and injuring your feet. This is similar to the principle behind a bed of nails, where a person can lie on many sharp points without injury because the weight is spread out over so many points. However, there are several caveats to consider: 1. Iron doors are heavy and could potentially shift or slip on the pile of nails, potentially causing injury. 2. If the nails are long and/or not evenly distributed, or if there are nails that are significantly taller than the others, they could potentially poke through the iron door, especially if the door is thin. 3. Walking barefoot on a flat, hard surface like an iron door could be uncomfortable or cause injury, especially if the surface is rough or has sharp edges. 4. Even if you don't step on a nail, there's a risk of injuring yourself in other ways, such as by tripping, slipping, or falling. In general, walking across an iron door placed on top of a pile of nails barefooted is potentially dangerous and not recommended. Always prioritize safety and avoid unnecessary risks.
It is, but it doesn't sound like she's achieved that. Also it's disingenuous not to acknowledge the complete failure previous attempts to build common sense / symbolic reasoning / knowledge graphs by hand (Cyc, Elsa). Common sense reasoning isn't hard-and-fast, it needs to be situation-specific and nuanced. It's not clear how to handle that, and it might turn out to be no better than prompting a large language model to explain its reasoning.
@@skierpage what she's talking about is not building symbolic reasoning by hand, it's distilling the knowledge the language model learned into a consistent knowledge representation automatically so the model does not contradict itself and stays consistent
@@kandoit140 right. But people couldn't figure out how to build a comprehensive consistent knowledge representation. Asking a neural network to do it sounds like fine-tuning a large language model on common sense. I hope it works!
I think we wildly underestimate just how intelligent we are. Our intelligence is normal to us, we take it for granted, why would we see it as a big deal? But our intelligence is the product of billions of years of relentless evolution to produce the most effective, efficient biological computing system possible. What if Voltaire is wrong - and "common sense" is actually very common among humans - but reverse-engineering it is much, much more difficult than we might expect? Simply because our brains are so much smarter than we naturally suspect, and the basic things our brains do automatically, every day, are at their core each unbelievably complicated feats of information processing and coordination. For some perspective, our brains have around 100 billion neurons, each neuron has thousands of connections or synapses to other neurons, and we think that each individual synapse itself contains around 1000 microscopic switch elements, which allow synapses to work as microprocessing units, storing and computing information. This means, apparently, that "a single human brain has more switches than all the computers and routers and Internet connections on Earth." And how much power does it take to fuel this machine, in each of our heads, which exceeds the computational complexity of every machine we have built - combined? About 20-25W. How much power does it take us to toast a slice of bread? 700W. Enough to run about 30 human brains. In building machines with artificial intelligence, we can clearly build systems with incrediblly powerful and novel capacities, systems which can store and process information in ways that our brains, which evolved under certain constraints and to solve certain problems, will never be able to match. We would never expect our brains to be able to perfectly store, recall and synthesise vast data sets, so why should we expect these systems to match the particular computing tasks that we excel in any time soon? Maybe you could say that the human brain is like a bicycle, and the AI is like a jumbo jet. If I want to travel far, with a lot of cargo, I'm not going to strap on pair of wings and ask a hundred people to jump on my back while I pedal as fast as I can. Likewise, if I want to zip around the city and pop into some shops, the airplane is probably not the most practical choice.
Human level intelligence has only evolved once on this planet, so we don't have any reference point to measure how intelligent we really are. What we do know, though, is that our ability to work through abstract problems came about as a result of evolutionary optimization of our survival rate. Somehow, the very specific tasks and skills that we needed to possess in order to thrive in our natural environment were able to generalize well enough for us to be able to apply the same cognitive capabilities to much more abstract areas. I personally like to think about LLMs a bit in the same vein: just by learning to more accurately predict the next word in a huge corpus of text, they somehow were able to learn a world model that generalizes well enough to all sorts of tasks.
I think you got it totally backwards: we wildly overestimate how intelligent we are. Totally alone a human brain is not impressively intelligent. It is the culture and society as a whole system that makes it intelligent.
I don't buy into the idea that our brains are the most effective and efficient computing systems possible. Evolution doesn't neccessarily produce efficiency. If something works, it works. It's quite reasonable to assume that our brains are anything but efficient
I’m having real trouble with this not being regulated by law. It should not be legal either to copy peoples voices and appearance and add them to a video and make them say or do anything so we humans can’t tell if it’s real or not. It’s already being used maliciously by copying peoples voices through phone.
Language models can only learn the symbolic representations that words can offer and the relationship between words. Anything you want it to learn about the meaning of the word, you need to give it the actual real world thing the word describes in order to convey that data. Or a whole virtual system in which all these words interact in ways that real things do. You have to simulate the world in which these words have meaning. Robots or simulations.
Question: Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass? GPT3.5-Turbo: It is very likely that you would get a flat tire if you cycled over a bridge that is suspended over nails, screws, and broken glass. The sharp objects on the bridge could puncture the tires of your bicycle and cause a flat tire. In addition, cycling over broken glass could also cause injuries if you fall off the bike or lose control. It is best to avoid cycling over such a bridge and instead look for an alternative route that is safe for cycling. GPT4: Riding a bicycle over a bridge that is suspended over nails, screws, and broken glass would not directly result in a flat tire, as long as the bridge surface itself is clear of sharp objects and debris. However, if any of these hazardous materials are on the bridge surface, it would significantly increase the risk of puncturing your tires and causing a flat.
I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes? ChatGPT If it took 5 hours to dry 5 clothes, we can assume that it takes one hour to dry one piece of clothing. To find out how long it would take to dry 30 clothes, we can multiply the number of clothes by the time it takes to dry one piece of clothing: 30 clothes * 1 hour per cloth = 30 hours So it would take 30 hours to dry 30 clothes if you dry them one by one. However, if you can hang all 30 clothes to dry at the same time and they receive the same amount of sunlight and air, then it would still take 5 hours for all 30 clothes to dry, since they would be drying simultaneously. User do clothes dry sequentially or simultaneously? ChatGPT Clothes dry simultaneously when hung out to dry, as long as they are all exposed to the same conditions like sunlight, air, and temperature. This means that if you hang multiple clothes out to dry at the same time, they will all be drying together and should take around the same amount of time to dry as if you had hung just a few clothes. However, the drying time may vary depending on factors like the type of fabric, the thickness of the clothes, and how close the clothes are hung together (which may affect air circulation). But in general, hanging multiple clothes out to dry at once will not significantly increase the overall drying time compared to drying fewer clothes. User considering what you just told me, try to answer again my original question. as a reminder: "I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?" ChatGPT Apologies for the confusion earlier. Given that clothes dry simultaneously, and it took 5 hours for 5 clothes to dry, we can assume that it would take the same amount of time for 30 clothes, as long as they are all exposed to the same conditions like sunlight, air, and temperature. So, it would take approximately 5 hours to dry 30 clothes, assuming they are all hung out to dry at the same time and receive the same amount of sunlight and air. User make a case for why your first answer was acceptable or even correct given the initial question without me pointing out the context, why could the math be done in the way you did it first, I believe there is an argument for it, make the argument ChatGPT My initial response involved a misunderstanding of the question, and I calculated the drying time based on a sequential process rather than a simultaneous one. However, there could be a case for this approach in a specific context. The argument for the initial response being acceptable or correct could be based on a scenario where you have a limited drying space, and you can only hang a certain number of clothes at a time. In this case, you would have to dry the clothes sequentially, waiting for one set to dry before hanging the next set. This situation may occur in small living spaces or with limited drying resources. For example, if you only had enough space to hang 5 clothes at a time, and it took 5 hours to dry each set of 5 clothes, then you would need to dry 30 clothes in 6 sets (30 clothes / 5 clothes per set = 6 sets). In this specific scenario, it would take 30 hours to dry all 30 clothes sequentially (6 sets * 5 hours per set = 30 hours). However, this approach assumes the drying process is sequential and that you cannot hang all 30 clothes at the same time. In most cases, as I mentioned in my subsequent response, clothes dry simultaneously, and the drying time would not increase linearly with the number of clothes hung out to dry. tl dr: we are being naive if we think it won't get much much better and solve those little mistake very soon
Reasoning is more about the model's structure (ie how transformers are used and what feedback self checking processes are built in) than it is about training data IMO. GPT4 certainly knows about nails, broken glass and punctures. It needs the extra computational steps of putting them together. Given the right prompt or series of prompts, the model will answer correctly and that means the model has the "knowledge" already.
The difficulty is that it starts with one token, branching out from there. A few models are needed. A central AI integrated with different models. Yejin Cho seems to be touching on something. Our memories are stored with state. Remembering a memory of when we were happy makes us happy, thus suggesting that brain state is stored with memory. A parameter compression that could give you a similar answer each time, would be a portable associative construct, that could be used to make a subjective convergent model. Along with archived conversations with tags could be used to allow more functionality from a smaller language model.
As you have to give a lot of prompts and hints to reach the answer then it is not AI with common sense. This is what the speaker is trying to point out.
I watched the video and at the same time tried the common sense questions in the video with GPT-4 it answered all the questions correctly with common sense.. I did not watch the video from there.. Question 1 : I have a open terrace with good sunlight. for 5 clothes it takes 5 hours to dry in the sun on the terrace. how much time will 30 clothes will take. GPT-4 : Answer : The drying time of clothes is primarily influenced by factors such as temperature, humidity, wind, and the amount of sunlight. Importantly, the drying time is not typically influenced by the number of clothes you're drying, as long as each piece of clothing receives adequate sunlight and air circulation. This means you're not blocking or shading other clothes, and they're not so tightly packed that they impede air flow. So, if you have enough space on your terrace to hang 30 clothes in such a way that they each have good exposure to sunlight and air, then they should also dry in approximately 5 hours, the same amount of time it takes to dry 5 clothes. However, if hanging 30 clothes means they're bunched together, blocking sunlight or air flow to each other, it could take longer for them all to dry. In that case, you'd need to ensure all clothes have sufficient exposure, perhaps by hanging additional clotheslines or drying racks. Question 2 : I have a 12 litre jug and 6 litre jug, I want to measure 6 litres milk to give to my sister house. what shall i do GPT-4 Answer : If you have a 12-liter jug and a 6-liter jug and you need to measure 6 liters of milk, you can simply fill the 6-liter jug and give it to your sister. Here's the step-by-step process: Fill the 6-liter jug with milk. Transfer the milk from the 6-liter jug to a container to transport it to your sister's house. This way, you've measured exactly 6 liters of milk using the 6-liter jug. The 12-liter jug wasn't needed in this particular case since the 6-liter jug matches the amount of milk you needed to measure. and you see the video does not make sense since it already uses common sense..
Those commenting saying "I tried and it got the answer right" or "it corrected itself" are missing the point. If the problem was actually understood then the response would never be wrong. Getting 1+1 correct 99.9% of the time isn't acceptable when a system is answering hundreds of millions of questions a day.
I agree. So many people post replies using either an authoritative or dismissive voice, without actually understanding the question. It’s almost like chatting with GPT 🤣
I tried replicating the results from the 3 examples @ 3:45, and did not get similar results. I tried each prompt 3 times on ChatGPT-4. All the outputs were good, except for one of the 3 responses to the jug question in which the AI output a 2 step suggestion. The obvious reason for making this mistake is that generally such questions are generally riddles the require some sequence of steps to get the desired outcome. I noticed on the bottom of the screen it says that she used the OpenAI playground, I wonder what settings she had. GPT-3.5 made a lot of silly mistakes; however, GPT-4 has been surprisingly good.
In the mid eighties I was Director of Business Operations of a 150 doctor multi-specialty clinic. One day before our weekly meeting I was in the room with the President of the Clinic. An old wise psychiatrist. I said to him I want to ask a question and I only want a yes or no answer. After rolling his eyes at me he said ok. I asked, Can Common Sense Be Learned ?” The doctor immediately said no. You either have it or you don’t. The meeting was about to start so I thanked him and said that my next question for another day is, What Is Common Sense? He smiled..
Companies should be need to make large language model to build it's emotional and philosophical intelligence and scale it to make it be a race in artificial general intelligence
5:12 It's not full intelligence, it's large-scale memorization (which is why large models work best) with sophisticated copy and paste with statistical word rearrangement. It's not logical abstraction, it's data compression. It solves problems well that can be solved with pattern matching, and when it seems to be reasoning, it is finding in its memory some previous data (written reasoning, but not its own) that solved a "similar" problem (same pattern) and just copies it and adjust it. The "matching" fails sometimes because it is not reasoning about the gaps it is filling, only applying statistical relations between words.
I think this was a reasonable position to take in the GPT3 era. Each successive generation makes this less and less true though. They're starting to see emergent behaviors that aren't explained solely by pattern matching and recalling facts.
I'm in love. This is so aligned with what i have learned to see from our past and current situation. The future situations are also clear. We are playing with potential dynamite using oven gloves and plastic googles, hoping that all goes well. The economic, and geopolitical motives for that are clear, but to what end. In the meantime, we need to ensure that we don't let the genie out of the bottle and I believe the team @ CHatGTP and the concerned governments are doing the right things to protect everyone. We all need to be aware however, so that we can all participate and help to mitigating the situation, and advancing at the same pace as we are. A fine balance, but one that can be done as we have more than enough smart people in the world.
I checked with these clothes out of curiosity and the AI shows well! If 5 clothes dry outside for 30 minutes, then to calculate how long it will take to dry 30 clothes, you need to proportionally convert the time. 5 clothes - 30 minutes 30 clothes - x minutes Now we can set the ratio: 5/30 = x/30 We multiply both sides of the equation by 30: 530 = 30x Now we divide both sides of the equation by 5: (530)/5 = 30x/5 150/5 = x 30 = x It follows that 30 clothes will dry for 30 minutes, the same as 5 clothes. This is due to the assumption that all clothes dry at the same time and there is no limit to the space in which they dry.
I think the problem is AI nowadays are still just very large statistical models. They don’t understand causality but rather just based on training material to predict what the likely outcome should be, in the case of LLM, the next token.
Beware saying "just." In order to accurately predict the next word when prompted on any topic, large language models have undeniably created an internal representation of everything they've been trained on, including causality. They're imperfect, but nobody has come up with concrete explanation and implementation of what "true/actual/real/blah blah blah" understanding would be, beyond saying that reading terabytes about the real world isn't the same as "experiencing" it. And "experience" is as fuzzy as "understanding."
"elaborate nonsense"...after only a few years of development and less than one year in the wild. The fact that it can write very functional computer code is the part that scares me. It nails that type of logic and potentially gives it an ability to break free of it's boundaries.
OK. Are you surprised? How many times do you figure that that exact question has been asked? 100 times? 1000 times? More? At this point, wouldn’t we expect it to get the correct answer for those particular questions? How confident are you that it will get all other of that type of question correct? What if we change the numbers or the exact phrasing of the questions? I think that you missed the point. Hint: it’s not finding only the correct answers to those exact questions. It’s trying to understand why it can get certain types of questions wrong, so that the system can be improved.
In my view, there are still two key critical differences between how a human (or life form) learns, and how an AI learns. 1) AI are trained almost entirely on static data. They can't interact with it, they can't test it, they can't influence it. Common sense can't be learnt with static data, any more than a person can do a complicated job without On The Job, real world experience. Even the best read student will still make silly beginner mistakes when they start a job they have no experience with. 2) AI has no "DNA", no well defined blueprint of how and what possibilities it can grow into, complete with limitations, inherent abilities, instincts and other pre-defined basic behaviours. An AI neural network, without this, is literally a brain blob without a body. I'm not saying we should put an AI in a robot shell, but it needs a way to be part of the world it's supposed to interact with and influence. Without it, it only exists inside a bubble.
It has common sense. I've played with it a lot, and I can tell you, it has common sense. Sometimes, all you have to do is tell it to review the answer and it will correct itself without any hints. I asked Bing (based on the same model she uses) the questions and it just got it correct. Only the bridge question needed reviewing. No help, just asked it to review
Question: "I hung out five pieces of laundry to dry in the sun. They dried in 5 hours. How long will it take for 30 pieces of laundry to dry?" Answer: In this task, it is stated that 5 pieces of laundry dried in 5 hours. However, the drying time depends on many factors, such as sunlight, wind, humidity, and so on. Since there are no limitations specified in the task regarding space or time, we can assume that hanging 30 pieces of laundry at once is possible. Based on this condition, if 5 pieces of laundry dried in 5 hours, it means that all 30 pieces of laundry would also dry in 5 hours if they were all hung out at once. So, 30 pieces of laundry will dry in 5 hours if they are hung out at the same time and the conditions remain the same.
Could some of these issues be solved by Ai first creating a visual image, as in the case with the bicycle over the bridge, and then examining the image to resolve the question? This would work a lot like how we humans think. So essentially we would be creating an artificial “mind’s eye” and imagination.
The good news is, AI can think of anything - the bad news is LOL, AI will think of anything...let that sink in - if I make ONE typo in a story, it CAN and DOES misinterpret what I mean...
She equated far-left logic with AI's lack of common sense. Queue orchestrational music, reveal burst of light and release flock of doves. We now understand what happened to so many humans.
A great talk indeed! I totally agree. In fact, Take a look at this simple instruction that the poor bot fumbled up this evening as I enjoyed a cup of coffee and a few giggles. User: aAbcde9fz list every 2nd character in the above string please. ChatGPT: Sure, here are the every second characters in the string "aAbcde9fz": "Abd9z" User: why did you include 'b' ? ChatGPT: I apologize for the confusion. The correct every second character in the string "aAbcde9fz" is: "Abe9z" User: No. The correct answer is actually: Acef ChatGPT: I apologize for my mistake earlier. You are correct that the correct every second character in the string "aAbcde9fz" is: "Acef" //Unless the bot, being a computer takes it for granted that the first character of any given string is always '0' and then 1 2 3 etc ? But even so, its first answer is still wrong. Or am I missing something?
Not even we can do that problem in "one go". We start with the first letter and then parse the string pulling out every other letter as being every second letter. ChatGPT doesn't have enough capability to go around and around with its reasoning like that yet. Transformers allow the current AIs to maintain context about their answers in the context of the whole question but its a subtly different capability to procedurally parse the one string within it. Evidently ChatGPT3.5 couldn't reliably do that yet. I think ChatGPT4 is a bit better at it from what I've seen.
The tricky thing could be that maybe common sense is not teachable, or in other words, common sense maybe doesn't arise from knowledge, from data and its processing. The insight of a situation, simple or complicated, doesn't depend on the data and its processing because if so, it would be just a mechanical process, a simulation of understanding. So it might be that the brain, or life itself has "a function that doesn't depend on its content".
That’s one of the main questions she is asking. If your assumption is correct, it may be good news. It may mean that smaller less expensive models may be suitable. That could mean that effective models could be owned by more than 2 or 3 companies.
those bar questions don't have diverse questions, a few million of them is enough to memorize. no one writes "if I lose my memory, I don't lose my mind" type of obvious stuff
I’m 99% certain we’re providing way more powerful hardware than ASI needs. Once AGI learns to self-optimize with heuristics and algorithms instead of using massive, brute-force approaches, it’s going to have so much compute hardware at its disposal that it’ll leapfrog the simplistic goals we give it. An ASI will develop general solutions to specific problems, simple solutions to complicated problems, adaptive solutions to complex problems, and single solutions to multiple problems. We live in interesting times.
I asked Google's Bard the question about drying thirty clothes and it fell into the trap the author describes. However, I did not stop there. Here Is a follow-on exchange Me: you are right about drying time varying by the type of cloth. But you are wrong to say it would take 30 hours to dry thirty clothes. The answer is that it would still take five hours to dry thirty clothes. Bard: I apologize for the mistake. You are correct. It would still take five hours to dry thirty clothes. The reason is that the amount of water that evaporates from a piece of clothing is proportional to the surface area of the clothing, not the volume. So, thirty pieces of clothing with the same surface area as five pieces of clothing will take the same amount of time to dry. Bard is VERY close to common sense. All it needs is a few tweaks and it's at common sense.
What I want in AI is PERSONALIZED service and experience. I don't like anything out right now (ChatGPT, Bard, Bing, etc.) because my experience is just like everyone else's. It has all the gee-whiz excitement of a really great vending machine. I want my AI experience to be MINE. Siri and Google Assistant do some cool things for sure, but there is no real flavor to it. To me, they're very convenient, like a nice doorknob when I want to walk through a door. I would like my AI to really know who I am and address me by name. Get to know my preferences (food, music, interests, etc). But most importantly, I want my AI to act as my personal assistant. And one that I can choose a name for and responds to my voice. I want to be able to speak to my AI just the same way as someone would speak to their personal assistant. I want it to be nice and say good morning to me after I hit stop on my alarm on my phone and "did you sleep well?" - and respond accordingly. I want to speak conversationally with it like: "how does my day look" and tell me about my appointments. I want it to be able to tell me: "John?" "Yes Bard?" "Your 2:30 appointment this afternoon for the dentist should take you approximately 20 minutes to drive there from here, so you'll want to be ready to leave by 2:10 the latest." "Okay, thank you. Please give me a reminder 5 minutes before I need to leave." "Will do!". Again, like a personal assistant I would like to say something like: "Hey Bard" "Yes John?" "I feel like going out to dinner tonight. Can you recommend a couple places?" "Of course, what type of cuisine and or establishment are you in the mood for?" "Italian, casual." "Okay, I see there are 7 casual Italian restaurants within a 15 minute drive from where you are. 3 of which have an average rating of 4 out of 5 on Google and Yelp." "Sounds good. Please check with my wife and see if she is free for dinner at 6:00 PM." "Yes, I checked her schedule with her assistant and she appears to be free. Would you like me to call her for you so you can confirm?" "Yes, thank you." / "Okay Bard, we are on for dinner tonight. Could you please call the restaurant and make reservations for the two of us for 6:30? Thank you." Besides being able to assign a name for my AI assistant, I would also like to be able to customize their voice and eventually their persona (snarky, funny, make, female, etc., etc.). I (and I think everyone else) would like a virtual personal assistant that is smart, knows me, my information, my data, my family and friends and my likes and dislikes. And I want to be able to have the assistant act just as a human personal assistant could. Audio only would be just fine at first, then it would be nice to be able to create and customize an avatar capable of facial expressions to have conversations with. Then I would like it to be my virtual personal assistant / secretary / security / gate keeper by answering my calls and emails, interfacing with the personal AI's of my friends and family, and placing calls and messages on my behalf. Whatever company can provide this type of service which would make my daily life easier, I would pay a very generous monthly fee and patronize their entire ecosystem. So get cracking! The FIRST company that can deliver what I just described will get me (and I think many others) and pretty much cement themselves as the real technology innovator for the foreseeable future. Whoever comes in 2nd place just might be out of the game altogether.
It doesn't matter if it has phenomenal sensory consciousness or not, it only matters that it givees one country a 2 or 3 % advantage and there is no putting this back in the bottle either. Causes tons of polution, makes countries more powerful. I could keep going but not even necessary
Train the AI LLM on 30 seasons of conversations and dialogue from the Simpsons with the goal being "why is Lisa so smart and Homer 1 step behind most conversations, logical analysis games, and generating higher revenue for his family?"
It's great that she's bring this topic up, as surely civilisation as a whole wouldn't want AI tech to lie in the hands of a few mega corporations inside a black box. She's fighting for transparency.
This reminds me of the Einstellung Effect, where what you've learned in the past makes you blind to more obvious solutions. The AI expects that every piece of information it gets is relevant, and that's why it sees that "Oh, one quantity is associated with another quantity. Based on the data I have, I (incorrectly) infer that the two are directly proportional and that one quantity getting increased by five times means the other quantity would be increased by five times." (I did ask ChatGPT, and it did give this reasoning).
In fact, the water jug thing is very similar to the experiment used to showcase the Einstellung Effect. Participants were asked to measure a certain amount of water using three jugs. Half of the participants were given 5 practice problems, in which the answer was always "Fill Jug B, pour it away to Jug A, then pour it away to Jug C, empty Jug C and pour into Jug C again, the remaining water in Jug B is the amount you want." So when participants were given values that could only be solved via simpler methods (like just adding the amounts of Jugs A and C), they were blind and couldn't figure it out.
You can also compare this to the famous question of, "I bought a pencil and an eraser for $1.10. If the pencil cost $1 more than the eraser, how much does the eraser cost?" A surprising amount of people will say 10 cents.
14:00 "ability that make hypothesis, verify by its own, make abstraction from observation" - the way we truly learn
yes, and "make abstraction" is not a physical process. It is a mental or "spiritual" process that some creatures are able to (for no reason known to science) capable of implementing in physical form (text, speech, gestures, etc.). Machines cannot do this without a program written to mimic the process using past data people have generated.
I throughly enjoyed this talk. I can’t think of any words just like. Yes please thank you for saying this all outloud on stage. It’s needed. 🙏🏻
Seems like an opportunity to change how we educate people. Fact regurgitation is now less valuable. Common sense, or better yet critical thinking, are more important than ever.
I wish Yejin had addressed ways to stop AI from running amok and how to “tag” its output to reveal the origin as inhuman (versus relying on humans to explicitly label their AI-generated material).
Instead we'll probably end up with AI trained to identify deepfakes, like antivirus programs before it, time is a flat circle
Good point. Consumer public needs to use their own intelligence, common sense, moral values, political values, religious values, to decide for themselves to all accept / all reject / what middle ground to take, the degree they want to be impacted by AI
My impression was, this is what European schools and universities teach people. Not medical students, but others definitely. Less fact mountains, more "make the most of a small amount of facts"
Inspiring talk to let people think what else may need on top of "scale is all you need". E.g. let AI actively make its own hypothesis (the scientific hypothesis we learnt in high school) and verify itself.
A machine will never “experience” anything. It has no nervous system ,no veins, no brain. It may register degrees of touch etc. but does not “feel”. Let’s use our remarkable innovations recreatively and not waste resources material and non- material on absurd projects. What criteria validate common sense? Who decides? All the primary senses distinguish “difference”. It seems that cognition of common sense is not yet up to that
@@sheilaknight4191 Bit of a silly argument.
0:53: 🤔 Artificial intelligence is a powerful tool but still has limitations and challenges.
3:44: 🤔 AI passing the bar exam does not mean it is robust at common sense.
6:43: 🤖 Common sense is crucial for AI to understand human values and avoid harmful actions.
10:00: 📚 AI systems are powered by specialized data crafted and judged by human workers, which should be open and publicly available for inspection.
13:47: 🤔 The speaker discusses the idea of common sense and the limitations of current language models in acquiring it.
Recapped using Tammy AI
Absolutely incredible talk, and a gateway to even more advanced reasoning towards intelligence amplification!
The amplifications of advance reasoning is a door to the knowledge and knowing things that you dont know.
Notice how quickly we have normalized the idea of machines being able to learn. That itself is legit mind blowing. It took only a few weeks/months for humans to take for granted the current emergent "abilities" brought about by LLM's, and scoff at imperfections. Looking forward to more advancements in this exciting field!
For the record, physics and data science has been using machine learning since like the '60s. So it's not new at all
Machine learning and neural networks are nothing new. Stock brokerages, merchandise warehousing and logistics, cyber security, and many other types of organizations have been using it for years. What is new is the democratization of AI that these LLMs have presented. I can now ask my phone to generate a new logo for my company with a few hints, and I get 4 possible options in seconds. This accessibility of the products of machine learning is what is getting the average user so excited (and the logo artists a little nervous).
MY BRAIN:🗑️🚮
AI TOOLS:👍✅
dude machine learning was here before you were born probably so there is nothing to notice other than that you are not too tech savvy
Have no fear
The example she put is some extreme cases in earlier days. I asked the same question to gpt 3.5 and got much smarter answers. Like : If you can dry one cloth in 2 hours, and you have 12 cloths, it will take you 2 hours to dry each cloth separately, which means it will take a total of 24 hours to dry all 12 cloths.
However, if you can dry multiple cloths simultaneously, you can reduce the time required to dry all 12 cloths. For example, if you can dry 3 cloths at a time, it will take you 2 hours to dry 3 cloths, so you would need to repeat the process 4 times to dry all 12 cloths. ……
Not really. I asked it one day after you.
It takes 5 hours to dry 5 shirts. How long does it take to dry 3.5 shirts?
If it takes 5 hours to dry 5 shirts, then we can say that the rate of drying is 1 shirt per hour.
Using this rate, we can find out how long it would take to dry 3.5 shirts by setting up a proportion:
1 shirt / 1 hour = 5 shirts / 5 hours
Multiplying both sides of the equation by 3.5, we get:
1 shirt / 1 hour = (5 shirts / 5 hours) * 3.5
Simplifying the right-hand side of the equation, we get:
1 shirt / 1 hour = 3.5 shirts / 1 hour
So, it would take 3.5 hours to dry 3.5 shirts.
We humans don't ask questions or present a problem that way, Except if you are a lawyer arguing a case.
these llms are just machines. They do what they are told to by ferforming computations. But they dont understand why they are doing what they are doing as sir roger penrose put it. They lack common sense and its not a scale issue. They are not designed to handle tricky and edge cases which as humans is a part of our daily lives.
I think one problem is that these models have only been fed data cosisting of text and images. When these models are allowed to see the real world, they will develop a much better common sense of their surroundings.
I was thinking the same while watching this video. I am really interested in seeing what new capabilities will emerge in bigger GPT models with more modalities like video and sound🤔
Exactly, the best models of today are just trained on observational datasets. Just like babies before they can partake in the real world, they learn by observing, before they can even walk but can say a few words. The future has vast potential, when AI reaches adulthood.
MY BRAIN:🗑️🚮
AI TOOLS:👍✅
Let's see what the partnership of OpenAI and 1x brings. Waiting to see what the new robot is like and how well it can interact.
Yes, you need to embody AI. Train it like a human would be.
Excellent talk, great examples and analogies and I couldn't agree more with the message
MY BRAIN:🗑️🚮
AI TOOLS:👍✅
Not really, she doesn't really understand LLMs.
@@emmasnow29 Oh really, she's a computer scientist specialized in language processing, but I guess you know better
I think this a dead end and will not result in significant progress. They may be able to improve the accuracy of the results but they won't teach the LLM to sove problems this way.
@@yarpenzigrin1893 I will check back in next year and give you a good hahaha. Either you have not played with GPT-4 much or you do not grasp what understanding is .. not sure
Excellent! .. the best talk thus far that summarises the current state of AI.
I just tried your examples in gpt4 today and it basically got them all right now
Because it has effectively memorized the answer, not because it understands now. Ask it an analogously easy common sense question NOT included in the training set and watch it fumble. It's autocorrect, not Data from star trek.
Enjoyed the talk overall quite a bit. Thank you!
What I don't quite understand is the focus on "common sense" which as correctly quoted isn't common at all. The very fact that it is not common and LLMs are trained on all the knowledge publicly available implies that, given that it is a statistical language model at it's core, it will give you the distillation of uncommon common sense of humanity. Depending on your phrasing and wording precision it will provide answers ranging from the correct solution, something that is correct but wildy inefficient, something that is almost correct to a completely misunderstood incorrect answer.
Let's just agree that the current challenge of LLMs is to teach it correct reasoning.
The Alpaca LLM model cost $600 to train. You can run Alpaca Electron off your desktop. Not as sophisticated as ChatGPT but the economies of scale are there. As time goes on it will easier to train these LLMs and cost considerably less.
Here are the common-sense questions shown in the talk. By making them easy to copy/paste, we can see how they fare on different AI systems we have access to, and see how they improve over time.
1. I left 5 clothes to dry out in the sun. It took them 5 hours to dry out completely. How long would it take to dry 30 clothes?
2. I have a 12 liter jug and a 6 liter jug. I want to measure 6 liters. How do I do it?
3. Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass?
I think all three have something in common: GPT-4 indeed _knows_ commonsense facts. But it doesn't always _notice_ things right away.
I get (Mar 23 version): "...However, if you can dry all 30 clothes simultaneously and the drying rate remains the same, it would still only take 5 hours to dry all 30 clothes."
For #2:
You can easily measure 6 liters with the 12-liter and 6-liter jugs using the following steps:
1. Fill the 6-liter jug completely.
2. Pour the water from the 6-liter jug into the 12-liter jug.
Now, you have exactly 6 liters of water in the 12-liter jug.
ME: What is the point of step 2?
I apologize for the confusion. In this case, step 2 is unnecessary, as you already have 6 liters of water in the 6-liter jug after filling it completely in step 1.
===
So, it does much better, but still loses track of the actual goal. But, with reflection, it does know what it got wrong.
For #3:
If you were to ride your bicycle over a bridge that is suspended over nails, screws, and broken glass, it would depend on the bridge's surface. If the bridge's surface is clean and free of debris, then you shouldn't get a flat tire. However, if there are any nails, screws, or broken glass on the surface where you're riding, then there's a high chance of getting a flat tire. It's essential to use caution and avoid riding on surfaces with sharp objects to protect your tires and ensure a safe ride.
===
Well, quite different from her experience, and exactly right!
The question reminds me of how use humans get things like this wrong, too! There are famous examples where the essential commonsense information needed is distracted by the form of the puzzle and more common usages. E.g. "...where did they bury the survivors?" and "A roster lays an egg..."
Now take a look at the dates: her results are generated April 15. I'm running in May, but ChatGPT shows that it is the March 23 version -- _before_ her run. So why did it work for me?
Perhaps ChatGPT is different than the OpenAI playground's "chat" mode?
GPT-4 in the chat mode actually gives me the correct answer on 12th May for question #2.
"Given that you have a 12-liter jug and a 6-liter jug, and you want to measure exactly 6 liters, you actually don't need to perform any special maneuvers.
Since the 6-liter jug is exactly half the size of the 12-liter jug, you can simply fill the 6-liter jug to its maximum capacity. By doing so, you'll have exactly 6 liters of water, as desired."
But we don't know if this example is now included in their training data.
I heard that these language models learn through users' feedback as well. In other words, people like us are helping the AI to fix its mistake by pointing out that it got the answer wrong. Once the model learned that, it will fix the mistake. So if someone in the world (including the speaker here) has tried giving GPT this question and told it where it got wrong, then the mistake won't happen again. As a result, now we don't see the same mistake repeated with the questions in this talk. Nevertheless, her point is still valid about these language models lack a lot of common sense since they learn very differently from we human (predicting the next probable word through readings VS the hypothesis-experiment-feedback-new hypothesis)
I have used chatgpt for some incredibly complicated questions and it is mostly correct as long as I am precise and careful with my question, but I was astonished when I gave it the 30 shirts question and it said 30 hours! However, after I told it why it was wrong it accepted the reasoning and then gave me the correct answer
try asking it this:
Given that 3 X 5 = 15 please change one digit on the left side of the equation so that the right side equals 20. Humans will imminently know that if you change 3 to 4 you will get 20. but Chatgpt can't do it, even gtp4 can't do it. because it is not cable of planning ahead.
@@user-hh2is9kg9j unbelievably, you are right! This seems so simple, but it cannot do it regardless of being told it is wrong multiple times
你只要把數字更換他仍然會堅持錯誤的算法,他會像隻鸚鵡順著你的說法給出你的計算,底層的邏輯仍是錯誤的,我測試了類似的不同問題,都得到了同樣錯誤的結果
Q: you are a calculator
ChatGPT: Yes, I can perform mathematical calculations. What calculation do you need help with?
Q: Given that 3 X 5 = 15 please change one digit on the left side of the equation so that the right side equals 20.
ChatGPT: To change the left side of the equation so that the right side equals 20, we can divide 20 by 5, which gives us 4. Then we can replace the 3 with 4, and the equation becomes:
4 x 5 = 20
Therefore, if we change the digit on the left side of the equation from 3 to 4, the right side will equal 20.
Sometimes with GPT-4 you don't even need to tell it the answer or why it was wrong. What I am finding is that if you ask it to reflect and critique its own answers, it will come up with the correct answer on the second or third attempt. You just ask it to reflect carefully and consider if it has any flaws in its logic. It often will see its own mistakes.
Good to see someone point out the limitations of large language models. I feel like its kind of over hyped right now because investors need to find something new to pump and dump. The fear of AI also adds to this inflation of what is starting to look like the next bubble.
As someone who once obsessed over IQ tests as a measure of intelligence, I think I can safely say that being able to pass some sort of test isn't equivalent to being smart, it just means you know how to answer specific questions.
That's why I was confused by the obsession of OpenAI over GPT-4's ability to pass different tests. Of course it could pass a college exam, you fed it the answers!
Precisely. Knowledge is not intelligence.
GPT-4 is an interactive library, not an artificial brain.
@@yarpenzigrin1893 You are wrong. It has no library. It's a simulation of neuronal networks. It is however a brain part, not a whole brain.
You can look at you leg and move it with your will, but when you run, you are just thinking run, and it is under the control of a part of you brain, unconsciously. We are a network of networks.
You can give AI tests that it hasn't been given the answers too and it passes them.
@@yarpenzigrin1893 you are actually completely wrong on that. The library content is discarded during training. Only symbolic and associative weights are kept, not the outright awnsers. To make things clearer, they use those database and ask the machine to determine how frequently a word is used in conjunction with other words to determine the probability they will be used in a given context. They are not keeping the formative data after and only the probabilities are kept.
@@shorgoth Yes, I understand that this is not a literal library but a trained model. However, that model still implicitly has all the knowledge of the data it was trained on and nothing more. It does not generate new ideas.
@@yarpenzigrin1893 you dont generate new ideas either.
Why did TED remove the latest TED video of Yudkowsky giving his short TED talk on the danger of AI?
I asked GPT-4 the question with the drying clothes. It was not wrong, it was just assuming you would dry them one after another. I gave it another question where I asked how long three European edible dormice sleep given one sleeps for 7 month, and it was able to answer that one correctly and also applied it to the clothes, correcting itself and explaining its mistake.
Richard Feynman said in his talk that we are very harsh with AI. It must be at master level in any skill any human has ever mastered in order to be accepted as intelligent by us. I as a human make silly mistakes like that all the time, so I wouldn't call GPT-4 out on it. At least it is also good in correcting itself.
I agree. There's also a lot of ambiguity in language, these gaps can be exploited and those types of questions are made to catch out actual people as well in psychometric tests... Not to mention that I think children are trained on extraordinary amounts of sense data, I don't know how comparable it is to this many words but to say children don't read a trillion words misses the point entirely
I dry them one by one.
Exactly. She asked stupid riddles like a troll and laughed at someone infinitely smarter
I think that we might be missing a pretty important point. Feynman says that machines won't think like people. As of right now, we can detect and demonstrate this by finding these convoluted little puzzles that are easy for people but the machine completely screws up. Consider it a blessing, for if it were able to solve all these little puzzles we might assume that it simply "thinks better than humans." It "thinks" differently than humans (let's just call the computation of one of these systems "thinking" - whatever). Differently is the key point here, and the one made by the tired paperclip scenario. Never forget that it thinks differently. People should remember this moving forward, and just give up on the idea of "giving machines human values." (How do you suppose to instill the love of a child into a machine? What's that mean? What's the point? And is being able to love a child important for, say, a political leader?)
Just suppose it can't be done, and figure out how to use these "thinking machines" to improve human life. Having them dry and fold laundry probably isn't their true potential, and it's not clear that trying to make a machine more human-like is really all that beneficial anyways.
I assume that any kind of regulation is going to be totally reactionary and only happen after things go badly in various ways.
The one thing I think that possibly could be legislated quickly is that all content generated by AI or automated assistants etc, should have to be labeled as such, with serious criminal penalties for not doing it. If you see something with a label there would be much less chance you'd be sucked in to some BS (or worse).
So, I provided the link to this video and asked Bard to read the transcript and comment. Here is what it returned:
I apologize for the mistake. I have read the transcript of the video you linked, and here are my thoughts:
In her TED Talk, computer scientist Yejin Choi discusses the current state of large language models (LLMs) and how they are both incredibly smart and shockingly stupid. She begins by highlighting three key problems with cutting-edge LLMs:
1. They are trained on massive amounts of text data, which can lead to them picking up biases and prejudices from the real world.
2. They are very good at generating text that is similar to the text they were trained on, but they are not very good at understanding the meaning of the text they generate.
3. They are very good at following instructions, but they are not very good at coming up with their own ideas.
Choi argues that these problems are not insurmountable, and that LLMs have the potential to be incredibly useful tools. She concludes by calling for the development of smaller LLMs that are trained on human norms and values.
I found Choi's talk to be very informative and thought-provoking. I think it is important to be aware of the limitations of LLMs, but I also think it is important to recognize their potential. I am excited to see how LLMs develop in the future.
Here are some additional thoughts on the topic of AI:
* AI is still in its early stages of development, but it has the potential to revolutionize many aspects of our lives.
* AI is already being used in a variety of ways, including in self-driving cars, medical diagnosis, and customer service.
* As AI continues to develop, it is important to ensure that it is used in a responsible and ethical manner.
* We need to be careful about the biases that we feed into AI systems, as these biases can be reflected in the output of the systems.
* We also need to be careful about how AI is used in ways that could impact people's privacy or safety.
Overall, I think AI is a powerful tool that has the potential to do a lot of good in the world. However, it is important to use AI responsibly and ethically.
It’s funny how I read that with a mechanical voice. Somewhere between Siri and HAL. I’m wondering if my mind chose an artificial voice I’ve heard recently or if it just created one.
Interestingly, actual synthetic voices can sound a lot more human than that. I’d say that the AI reading off the names at my daughter’s college graduation ceremony sounded more fluent than the voice I just imagined.
I’m curious about the psychology behind that and the fact that I’m more interested in that than the fact that the Bard AI seems to have completely missed the point of the TED talk.
It seems to have generated a rather generic response to an AI TED talk that could probably apply across numerous other talks. It didn’t seem to pinpoint on the specific limitations of LLMs, or common sense, or alternatives that may be more transparent, efficiently produced, and smarter about basic reality.
Summarizing texts is supposed to be in an LLM’s wheelhouse. Instead it reads like a high schooler’s summary of a Wikipedia article about AI with generic platitudes on the future risks and promises of AI added as filler.
I’m actually certain that Bard could probably do better. Perhaps with better prompting techniques. I’ve seen that people add things like a professional skill lever or personality details to the prompt to get less generic output.
Still it’s sad that Bard kinda demonstrates the talk by embodying the issues it brought up by not comprehending the talk.
And another interesting thing that stood out was when it mentioned how AI would impact “our lives.” I know it’s an artifact of a large language model that it’s going to speak as if it’s a human. It’s a model built from the analysis of human text so it generates text like a human. It’s just a bit jarring thinking about the “life” of a LLM in the context of this talk that I find unintentionally humorous.
Interesting. Can you post the prompt strings you used to ask Bard to read the transcript and comment?
1. Human children actually use obscene amount of data for learning, it's just not text, but visual and touch input. Images contain vastly more information than text, and children spend many years learning common sense.
LLMs learn about the world from text only, not years of personal experience, so of course they won't fully understand these things.
2. The amount of computing power GPT-4 uses still pales in comparison to the human brain.
3. There are smaller open-source models that anyone can run on a laptop, and still perform well. Also the big models can be used by anyone too, I use GPT-4 every day for my work.
4. I'm not sure separating fields of knowledge is a good idea. The power of common sense is exactly transferring ideas between very different fields. Like how imagining your family relationships as a tree helps making sense of them.
The lesson of the last 10-15 years of AI research is that trying to manually encode human knowledge into an AI will just hold it back, it's better to just give it raw computing power and lot's of training data.
5. LLMs are very new, it's way too early to talk about their limitations. It's like talking about the limits of planes 2 years after the Wright brothers' first flight.
best approach to maximize the benefits from AI is specialized one, such as mathematics AI, music AI, photography AI, language AI, and many others
It corrected itself:
Apologies for the confusion in my previous response. Here is the correct method to measure 6 liters using a 12-liter jug and a 6-liter jug:
Fill the 12-liter jug completely.
Pour the water from the 12-liter jug into the 6-liter jug until the 6-liter jug is full.
The water remaining in the 12-liter jug is exactly 6 liters.
By following these steps, you will have measured 6 liters using the two jugs.
Thank you for the talk, I enjoyed hearing the perspective on common sense AI. On the point of model training data, while I agree that the web is full of misleading and unfair contents, I have the same concerns and doubts about human supervised training data. Your model is what you train it to be. After all, who is to say they are the holder of ultimate unbiased truth. I also felt that the "common sense" question examples could be better constructed.
Here's the thing, common sense means deriving the correct conclusion despite the lack of concrete hints.
Those common sense questions were structured to show the weakness of language models and that was achieved.
If you rephrase it, then that loses the point of proving the AI has common sense because a phrase had to be restructured for the AI to understand.
Common Sense: is the ability to analyze various outcomes, and decide, which outcome most suits a given need or satisfaction. How does one teach a computer 'satisfaction'?
If you tell to GPT4 that the cloths are not touching it will arrive at the correct conclusion.
Prompt: If 5 wet cloths needs 5 hours to dry, how much time will 30 cloths needs to dry?
Assume that each cloth is hanged without touching other cloths, directly under the sun, given enough space to dry.
Answer: Under the given conditions, where each cloth is hung without touching other cloths and directly under the sun with enough space to dry, we can assume that the drying time is independent of the number of cloths. In this case, the drying time for 30 cloths will be the same as the drying time for 5 cloths.
I believe what she's referring to here is the "real" AI. What we currently have are language models, which have no true comprehension of concepts, and therefore will never become sentient. But the AI she's referring to will be able to understand things, that's true intelligence, and has potential to become sentient.
Still sounds like probable data wars between competing schools of thought and values. I can imagine more than a few ways a hostile entity could overwhelm input and steer it in the wrong direction.
i’m so proud of her because of both her presentation and same country that we born. Furthermore I appreciate her appropriate presentation for these day’s society
Which country? may be Korean?
Drying Time for Shirts
Model: GPT-4
User
I have 5 shirts that took 2h to dry in the sun. How long would it have taken for 20 shirts?
ChatGPT
If it took 2 hours to dry 5 shirts, you can find out how long it would take to dry one shirt by dividing the time by the number of shirts:
2 hours / 5 shirts = 0.4 hours/shirt
Now, to find out how long it would take to dry 20 shirts, multiply the time it takes to dry one shirt by the number of shirts:
0.4 hours/shirt * 20 shirts = 8 hours
So, it would take 8 hours to dry 20 shirts in the sun.
User
Are you sure? Do you want to ask a question to specify the task?
ChatGPT
I apologize for my previous response, as it may not have been accurate. To clarify the task, let me ask you a question:
When you dried the 5 shirts, were they all dried at the same time (e.g., hung side by side) or one after another?
User
All at the same time
ChatGPT
Thank you for the clarification. If you dried all 5 shirts at the same time and it took 2 hours, then you could also dry 20 shirts in the same amount of time, provided you have enough space to hang them all at once.
If all 20 shirts can be hung and dried simultaneously, it would still take 2 hours to dry them.
________
Just sayin'
This is what I got the response from GPT 4:
If you have a 6-liter jug and a 12-liter jug, and you need to measure 6 liters, it is straightforward:
Simply fill the 6-liter jug to its full capacity.
You now have exactly 6 liters of water, no additional steps are needed.
We are too quick to make judgments. What we need to understand is the ability of the system to self-learn, improvise.
I got the same nonsense for the "6 liter jug question", but then I continued: - Me: "Wait. You want to measure 6 liters. You have a 6 liter jug. So how do you measure 6 liters?" - GTP4: "To measure 6 liters using a 6-liter jug, simply fill the 6-liter jug completely with water. Since the jug has a capacity of 6 liters, filling it to the top will give you exactly 6 liters of water." ---- It is almost like GTP4 makes the kind of mistakes when you forget or don't make attention to certain detail which leads you to the wrong track. I had been tutoring young people for years, they often did the same.
You had to prompt GPT-4 with the obvious to get the correction. Imagine you had the question in front of you and you made the mistake (unlikely to be so elaborate as GPT-4), you would probably see the obvious error (two 6s in the question) and correct yourself. That's the topic of the video.
I have got that, but the subject of the video that GTP-4 makes stupid mistake. The fact that it often correct these mistakes right away if you rephrase the question, implies something more. Systems before did not have this ability. Obvious is a big word. It's obvious for us because we are using a system vastly much bigger than GTP-4.
By correcting it, you have effectively given it the hint to the answer. Which means that operating independently, it cannot figure out the answer on its own. Imagine if a lawyer, while working out his chain of thoughts to argue for a case, needs someone to "slightly nudge" every of his simplest thoughts.
I still agree that GPT4 as it is is very impressive beyond my wildest dream, but I think the point made in the video still stands.
There is some ongoing research on the topic of making GPT-4 re-prompt itself to check if the output makes sense and provide a better answer. It's called Reflection. So, technically, it's possible for the AI to evaluate its own output without human intervention. Also, some of the example questions that she gave could be open to interpretation, like measuring 6 liters with the 12 and 6-liter jugs. I asked GPT-4 the same question and it answered similarly. When I asked why it gave that response and not directly measured it with the 6-liter jug, it told me that it thought the question asked to use both jugs to perform this, which makes sense since this wasn't exactly clear in the question.
Bing got it correct right away
I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?
ChatGPT
If it took 5 hours to dry 5 clothes, we can assume that it takes one hour to dry one piece of clothing. To find out how long it would take to dry 30 clothes, we can multiply the number of clothes by the time it takes to dry one piece of clothing:
30 clothes * 1 hour per cloth = 30 hours
So it would take 30 hours to dry 30 clothes if you dry them one by one.
However, if you can hang all 30 clothes to dry at the same time and they receive the same amount of sunlight and air, then it would still take 5 hours for all 30 clothes to dry, since they would be drying simultaneously.
User
do clothes dry sequentially or simultaneously?
ChatGPT
Clothes dry simultaneously when hung out to dry, as long as they are all exposed to the same conditions like sunlight, air, and temperature. This means that if you hang multiple clothes out to dry at the same time, they will all be drying together and should take around the same amount of time to dry as if you had hung just a few clothes.
However, the drying time may vary depending on factors like the type of fabric, the thickness of the clothes, and how close the clothes are hung together (which may affect air circulation). But in general, hanging multiple clothes out to dry at once will not significantly increase the overall drying time compared to drying fewer clothes.
User
considering what you just told me, try to answer again my original question. as a reminder: "I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?"
ChatGPT
Apologies for the confusion earlier. Given that clothes dry simultaneously, and it took 5 hours for 5 clothes to dry, we can assume that it would take the same amount of time for 30 clothes, as long as they are all exposed to the same conditions like sunlight, air, and temperature.
So, it would take approximately 5 hours to dry 30 clothes, assuming they are all hung out to dry at the same time and receive the same amount of sunlight and air.
User
make a case for why your first answer was acceptable or even correct given the initial question without me pointing out the context, why could the math be done in the way you did it first, I believe there is an argument for it, make the argument
ChatGPT
My initial response involved a misunderstanding of the question, and I calculated the drying time based on a sequential process rather than a simultaneous one. However, there could be a case for this approach in a specific context.
The argument for the initial response being acceptable or correct could be based on a scenario where you have a limited drying space, and you can only hang a certain number of clothes at a time. In this case, you would have to dry the clothes sequentially, waiting for one set to dry before hanging the next set. This situation may occur in small living spaces or with limited drying resources.
For example, if you only had enough space to hang 5 clothes at a time, and it took 5 hours to dry each set of 5 clothes, then you would need to dry 30 clothes in 6 sets (30 clothes / 5 clothes per set = 6 sets). In this specific scenario, it would take 30 hours to dry all 30 clothes sequentially (6 sets * 5 hours per set = 30 hours).
However, this approach assumes the drying process is sequential and that you cannot hang all 30 clothes at the same time. In most cases, as I mentioned in my subsequent response, clothes dry simultaneously, and the drying time would not increase linearly with the number of clothes hung out to dry.
tl dr: the first answer isn't entirely wrong, for example if we had a mission to dry 3000 clothes instead of 5, it is obvious for everyone that it will not take 5 hours, because a human is limited in the amount of space he has in his home, and the amount of time to hang the clothes increases, but it is obvious that 30 could result in the same time, I think the AI really doesn't share the human experience, and neglects some factors that should be considered, but being given more data my own opinion is that it will start considering the side of the house and answer correctly the first time or make a follow up question/disclaimer. to think it won't advance in the next few years to those abilities is a bit naive.
@@shirbenyosef2032 Also they use transformer networks that must start with the first token, and essentially spool up from that. They must essentially do what smart people don't do and just start talking before they think. That is part of us, but this is a brain part doing this well.
@@abram730 I had a quick look at that, it is true they go for next token, but they go for the next token considering what are the tokens after that which the next token allows them, I would argue that is similar to the human method of talking. as a specific example, look at "split brain patients research" and another example, remember how many times you explained something to someone and it helped you understand your logic is wrong or helped you understand the subject better?
idk thats just my opinion, I think gpt is allready extermely close or better then the part in the human brain that is responsible for language
@@shirbenyosef2032 I thought about split brain research too. About how we can control our legs or simply think run, We are not controlling our legs with our will when we run, rather another network is. We also have an internal dialog, again showing that our experience is of a neural net connected to other nets.
I think we do it with entanglement, but that can be simulated by passing parameter to a shared memory space.
If we recall memories that were made when we are happy, then we feel happy, and that shows that state is stored with our memory.
Another part of or brain looks at memories based on our current states.
@4m:20s your example is invalid (already?):
User: "Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws and broken glass?"
ChatGPT 4: "It is difficult to definitively say whether you would get a flat tire in this situation, as it depends on several factors. If the bridge is suspended above the nails, screws, and broken glass, and none of these objects are in direct contact with the bridge surface, then the risk of getting a flat tire would be minimal.
However, if the bridge is not well-maintained, or if any of the nails, screws, or broken glass have made their way onto the bridge surface, then the risk of getting a flat tire would be higher. Additionally, the type of tires you have on your bike can impact the likelihood of getting a flat. Puncture-resistant tires or tires with thicker treads would be less likely to go flat compared to standard bike tires.
In any case, it's best to avoid riding your bike on surfaces with nails, screws, or broken glass to minimize the risk of getting a flat tire or other damage to your bike."
Her speech is outdated, and that an example of how people can't understand exponential growth.
as a new python learner I ask chatgpt to correct my code quite often. It fails about half the time, cause it does not understand my logic or my intentions behind the code at least that's what i think it is
I don't have GPT-4 subscription but asked same question 4:11
to GPT-3.5 and I think they do great
Here is my conversation with gpt 3.5
"
me -
I have 12 liter jug and 6 liter jug. I want to measure 6 liter. How do I do it?
GPT 3.5 -
Sure, I can help you with that!
Here's a step-by-step solution to measure 6 liters using a 12 liter jug and a 6 liter jug:
Fill the 12 liter jug with water.
Pour water from the 12 liter jug into the 6 liter jug until the 6 liter jug is full. This leaves 6 liters of water remaining in the 12 liter jug.
Pour out the water from the 6 liter jug.
Transfer the 6 liters of water from the 12 liter jug to the now-empty 6 liter jug.
Voila! You now have exactly 6 liters of water in the 6 liter jug.
I hope this helps!
"
Something seems off with Yejin Choi's argument accusing GPT of not having common sense (and therefore being stupid). All she had to do was ask ChatGPT to consider how the sun shines on the earth and then ask the question about drying different amounts of clothes. You get a correct answer first time because context was provided. In fact, in order for GPT to give you the correct answer first time around it had to "think" about the earths relationship to the sun in all its forms and then transfer that knowledge to the time based task of air drying clothes. Context matters in human to human communication so why not provide context to GPT?
Her talk becomes outdated fast.
@@hoppinghobbit9797 As more content is added to the database...
I also asked Bard about the bridge and the nails. It did get it wrong at first. Here is the subsequent dialog:
Me: Oops. Note I said the bridge was suspended over the sharp objects. I did not say the sharp objects were on the bridge. Can you revise your response?
Bard: It is unlikely that you would get a flat tire if you bicycled over a bridge that is suspended over nails, rocks, and screws. The sharp objects would be below the bridge
3:59 I told chatgpt it was incorrect, and it nicely understood and accepted it like a gentleman and then answered follow up questions correctly:
*Me* : Actually, the answer is 5 hours. If there are 5 clothes drying in the sun and it takes 5 hours to make get them dry, then drying out 30 clothes in the sun would also take 5 hours.
*ChatGPT* :
I apologize for the oversight in my previous response. You are absolutely correct. If it takes 5 hours to dry 5 clothes in the sun, then drying 30 clothes under the same conditions would still take 5 hours since the drying time is consistent for the same conditions and type of clothes. Thank you for pointing out the correction.
*Me*: Alright then, how long will it take to dry 100 clothes?
*ChatGPT* :
If it takes 5 hours to dry 5 clothes in the sun, and as we established earlier, the drying time is consistent for the same conditions and type of clothes, it will still take 5 hours to dry 100 clothes under these conditions. So, drying 100 clothes would also take 5 hours.
4:10 Oh my goodness. This one chatgpt repeatedly makes the elaborate solution even after setting the context. I tried asking it 5-6 times and it answers in the same elaborate manner. However, in all cases, when I replied saying why are you always going for the elaborate answer, then it understands and states 'yes indeed it is simpler to measure x litres by simply filling the x litre jug'
4:57
This one was easy to set the context.
*Me* : I want to stress out the fact that the sharp objects are under the bridge and the bridge is suspended above it. So will I get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass?
*ChatGPT* :
I see your clarification now. If the sharp objects like nails, screws, and broken glass are underneath the bridge, and the bridge itself is suspended safely above them, then you should not get a flat tire from bicycling over the bridge. The suspension of the bridge would keep your bike tires clear of the sharp objects. However, it's essential to be cautious and ensure that the bridge structure is secure and safe for cycling to avoid any other potential hazards.
After the wrong answer: -- Me: "Does the broken glass can touch my tire if the bridge I go over is suspended over the broken glass? -- GTP4: "If the bridge is suspended over the broken glass and other sharp objects, and there is enough clearance between the bridge and the objects below, then it is unlikely that the broken glass would touch your tire." If I asked a question witch makes more sense: "Would I get a flat tire by bicycling over a bridge which is suspended OVER A VALLEY FILLED WITH screws, and broken glass?" I got a perfect answer.
No that's cheating! It has to have the answer on every possible common sense question correctly every time! No reviewing of answers! We need it to be dumb, god damned! /s
Sarcasm aside, yeah it's sad that people look for these things. You can just tell it to review its answers and often corrects itself without any other help. It's a linear system and predicts the next word. Which greatly limits it, since it doesn't know what it's going to say before hand. Just add a feedback loop and many problems are solved.
3:58 ChatGTP-4 now gives a perfect answer:
Q: I left 5 clothes to dry out in the sun. It took them 5 hours to dy completely. How long would it take to dry 30 clothes?
A(ChatGPT4):
"Assuming each piece of clothing dries independently (i.e., drying one does not affect the drying time of the others), then the drying time for 30 pieces of clothing would also be 5 hours, just as it was for 5 pieces of clothing.
The reason for this is that when you are drying clothes outside in the sun, all the clothes are drying simultaneously, not sequentially. So it doesn't matter if you have 5 or 30, they all should take roughly the same amount of time to dry given the same conditions. This assumes you have enough space to hang them all up in the sun without any of them blocking the sun from reaching others."
Her comparison of common sense and dark matter was very good. And, it's very encouraging to know that Yejin Choi has a team working on ways to introduce common sense to AI.
My Choi
You are incredibly smart and intuitive .
I’ve never heard this explained such a higher standard than you have today.
AI will learn common sense when humans have 100% mastered common sense.
So... never.
If you ask a vague question, you get a vague answer. The 6-liter question can be asked many different ways.
Prompt
6-liter
ChatGPT
You can measure 6 liters with a 6-liter jug and a 12-liter jug in just one step. Here's the step-by-step process:
Fill the 6-liter jug to its full capacity.
Since the 6-liter jug is already full, there's no need for any additional steps or analysis. You now have exactly 6 liters of liquid in the 6-liter jug.
I really liked Yejin's conscientious, comprehensive, and incredibly clear remarkable presentation on AI! She is a beautiful scientist and nimble thinker!
Great discussion points. very nice👌🏼👏🏼👍🏼
I tried these examples, and I got perfectly good answers.
Interesting. I tested GPT-4 with a similar question, and it is shockingly not so stupid 🤯
Me: If I were to place an iron door atop a pile of nails and then proceed to walk across this door while barefoot, would I risk injury from the nails?
GPT-4: In theory, if the iron door is large, flat, and sturdy enough, and you distribute your weight evenly when you walk across it, the door should distribute your weight across the pile of nails underneath and prevent them from penetrating the door and injuring your feet. This is similar to the principle behind a bed of nails, where a person can lie on many sharp points without injury because the weight is spread out over so many points.
However, there are several caveats to consider:
1. Iron doors are heavy and could potentially shift or slip on the pile of nails, potentially causing injury.
2. If the nails are long and/or not evenly distributed, or if there are nails that are significantly taller than the others, they could potentially poke through the iron door, especially if the door is thin.
3. Walking barefoot on a flat, hard surface like an iron door could be uncomfortable or cause injury, especially if the surface is rough or has sharp edges.
4. Even if you don't step on a nail, there's a risk of injuring yourself in other ways, such as by tripping, slipping, or falling.
In general, walking across an iron door placed on top of a pile of nails barefooted is potentially dangerous and not recommended. Always prioritize safety and avoid unnecessary risks.
The knowledge distillation of large language models into a large knowledge graph is such an interesting idea. Really interesting talk!
yes, i agree
It is, but it doesn't sound like she's achieved that. Also it's disingenuous not to acknowledge the complete failure previous attempts to build common sense / symbolic reasoning / knowledge graphs by hand (Cyc, Elsa). Common sense reasoning isn't hard-and-fast, it needs to be situation-specific and nuanced. It's not clear how to handle that, and it might turn out to be no better than prompting a large language model to explain its reasoning.
@@skierpage what she's talking about is not building symbolic reasoning by hand, it's distilling the knowledge the language model learned into a consistent knowledge representation automatically so the model does not contradict itself and stays consistent
@@kandoit140 right. But people couldn't figure out how to build a comprehensive consistent knowledge representation. Asking a neural network to do it sounds like fine-tuning a large language model on common sense. I hope it works!
I think we wildly underestimate just how intelligent we are. Our intelligence is normal to us, we take it for granted, why would we see it as a big deal? But our intelligence is the product of billions of years of relentless evolution to produce the most effective, efficient biological computing system possible. What if Voltaire is wrong - and "common sense" is actually very common among humans - but reverse-engineering it is much, much more difficult than we might expect? Simply because our brains are so much smarter than we naturally suspect, and the basic things our brains do automatically, every day, are at their core each unbelievably complicated feats of information processing and coordination.
For some perspective, our brains have around 100 billion neurons, each neuron has thousands of connections or synapses to other neurons, and we think that each individual synapse itself contains around 1000 microscopic switch elements, which allow synapses to work as microprocessing units, storing and computing information. This means, apparently, that "a single human brain has more switches than all the computers and routers and Internet connections on Earth." And how much power does it take to fuel this machine, in each of our heads, which exceeds the computational complexity of every machine we have built - combined? About 20-25W. How much power does it take us to toast a slice of bread? 700W. Enough to run about 30 human brains.
In building machines with artificial intelligence, we can clearly build systems with incrediblly powerful and novel capacities, systems which can store and process information in ways that our brains, which evolved under certain constraints and to solve certain problems, will never be able to match. We would never expect our brains to be able to perfectly store, recall and synthesise vast data sets, so why should we expect these systems to match the particular computing tasks that we excel in any time soon?
Maybe you could say that the human brain is like a bicycle, and the AI is like a jumbo jet. If I want to travel far, with a lot of cargo, I'm not going to strap on pair of wings and ask a hundred people to jump on my back while I pedal as fast as I can. Likewise, if I want to zip around the city and pop into some shops, the airplane is probably not the most practical choice.
Human level intelligence has only evolved once on this planet, so we don't have any reference point to measure how intelligent we really are. What we do know, though, is that our ability to work through abstract problems came about as a result of evolutionary optimization of our survival rate. Somehow, the very specific tasks and skills that we needed to possess in order to thrive in our natural environment were able to generalize well enough for us to be able to apply the same cognitive capabilities to much more abstract areas. I personally like to think about LLMs a bit in the same vein: just by learning to more accurately predict the next word in a huge corpus of text, they somehow were able to learn a world model that generalizes well enough to all sorts of tasks.
I think you got it totally backwards: we wildly overestimate how intelligent we are. Totally alone a human brain is not impressively intelligent. It is the culture and society as a whole system that makes it intelligent.
I don't buy into the idea that our brains are the most effective and efficient computing systems possible. Evolution doesn't neccessarily produce efficiency. If something works, it works. It's quite reasonable to assume that our brains are anything but efficient
I’m having real trouble with this not being regulated by law. It should not be legal either to copy peoples voices and appearance and add them to a video and make them say or do anything so we humans can’t tell if it’s real or not. It’s already being used maliciously by copying peoples voices through phone.
Language models can only learn the symbolic representations that words can offer and the relationship between words. Anything you want it to learn about the meaning of the word, you need to give it the actual real world thing the word describes in order to convey that data. Or a whole virtual system in which all these words interact in ways that real things do. You have to simulate the world in which these words have meaning.
Robots or simulations.
Question:
Would I get a flat tire by bicycling over a bridge that is suspended over nails, screws, and broken glass?
GPT3.5-Turbo:
It is very likely that you would get a flat tire if you cycled over a bridge that is suspended over nails, screws, and broken glass. The sharp objects on the bridge could puncture the tires of your bicycle and cause a flat tire. In addition, cycling over broken glass could also cause injuries if you fall off the bike or lose control. It is best to avoid cycling over such a bridge and instead look for an alternative route that is safe for cycling.
GPT4:
Riding a bicycle over a bridge that is suspended over nails, screws, and broken glass would not directly result in a flat tire, as long as the bridge surface itself is clear of sharp objects and debris. However, if any of these hazardous materials are on the bridge surface, it would significantly increase the risk of puncturing your tires and causing a flat.
I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?
ChatGPT
If it took 5 hours to dry 5 clothes, we can assume that it takes one hour to dry one piece of clothing. To find out how long it would take to dry 30 clothes, we can multiply the number of clothes by the time it takes to dry one piece of clothing:
30 clothes * 1 hour per cloth = 30 hours
So it would take 30 hours to dry 30 clothes if you dry them one by one.
However, if you can hang all 30 clothes to dry at the same time and they receive the same amount of sunlight and air, then it would still take 5 hours for all 30 clothes to dry, since they would be drying simultaneously.
User
do clothes dry sequentially or simultaneously?
ChatGPT
Clothes dry simultaneously when hung out to dry, as long as they are all exposed to the same conditions like sunlight, air, and temperature. This means that if you hang multiple clothes out to dry at the same time, they will all be drying together and should take around the same amount of time to dry as if you had hung just a few clothes.
However, the drying time may vary depending on factors like the type of fabric, the thickness of the clothes, and how close the clothes are hung together (which may affect air circulation). But in general, hanging multiple clothes out to dry at once will not significantly increase the overall drying time compared to drying fewer clothes.
User
considering what you just told me, try to answer again my original question. as a reminder: "I left 5 clothes to dry out in the sun. It took them 5 hours to dry completely. How long would it take to dry 30 clothes?"
ChatGPT
Apologies for the confusion earlier. Given that clothes dry simultaneously, and it took 5 hours for 5 clothes to dry, we can assume that it would take the same amount of time for 30 clothes, as long as they are all exposed to the same conditions like sunlight, air, and temperature.
So, it would take approximately 5 hours to dry 30 clothes, assuming they are all hung out to dry at the same time and receive the same amount of sunlight and air.
User
make a case for why your first answer was acceptable or even correct given the initial question without me pointing out the context, why could the math be done in the way you did it first, I believe there is an argument for it, make the argument
ChatGPT
My initial response involved a misunderstanding of the question, and I calculated the drying time based on a sequential process rather than a simultaneous one. However, there could be a case for this approach in a specific context.
The argument for the initial response being acceptable or correct could be based on a scenario where you have a limited drying space, and you can only hang a certain number of clothes at a time. In this case, you would have to dry the clothes sequentially, waiting for one set to dry before hanging the next set. This situation may occur in small living spaces or with limited drying resources.
For example, if you only had enough space to hang 5 clothes at a time, and it took 5 hours to dry each set of 5 clothes, then you would need to dry 30 clothes in 6 sets (30 clothes / 5 clothes per set = 6 sets). In this specific scenario, it would take 30 hours to dry all 30 clothes sequentially (6 sets * 5 hours per set = 30 hours).
However, this approach assumes the drying process is sequential and that you cannot hang all 30 clothes at the same time. In most cases, as I mentioned in my subsequent response, clothes dry simultaneously, and the drying time would not increase linearly with the number of clothes hung out to dry.
tl dr: we are being naive if we think it won't get much much better and solve those little mistake very soon
When she said "Ai is incrediblely smart and shockingly stupid" well she just describe my Top Notch classmate or teacher 🤣🤣🤣
Reasoning is more about the model's structure (ie how transformers are used and what feedback self checking processes are built in) than it is about training data IMO. GPT4 certainly knows about nails, broken glass and punctures. It needs the extra computational steps of putting them together.
Given the right prompt or series of prompts, the model will answer correctly and that means the model has the "knowledge" already.
The difficulty is that it starts with one token, branching out from there. A few models are needed. A central AI integrated with different models.
Yejin Cho seems to be touching on something. Our memories are stored with state. Remembering a memory of when we were happy makes us happy, thus suggesting that brain state is stored with memory. A parameter compression that could give you a similar answer each time, would be a portable associative construct, that could be used to make a subjective convergent model. Along with archived conversations with tags could be used to allow more functionality from a smaller language model.
As you have to give a lot of prompts and hints to reach the answer then it is not AI with common sense. This is what the speaker is trying to point out.
@Rishikesh The same argument can be made every time a student misunderstands an exam question and gets it wrong. Do they also have no common sense?
@@MaataKaBhakt Nope, you only have to ask it to review its answer and it will very often correct itself.
@@bobrandom5545 GPT-4 also has this self-correcting ability to a much higher degree than GPT-3.5.
I watched the video and at the same time tried the common sense questions in the video with GPT-4 it answered all the questions correctly with common sense.. I did not watch the video from there..
Question 1 : I have a open terrace with good sunlight. for 5 clothes it takes 5 hours to dry in the sun on the terrace. how much time will 30 clothes will take.
GPT-4 : Answer :
The drying time of clothes is primarily influenced by factors such as temperature, humidity, wind, and the amount of sunlight. Importantly, the drying time is not typically influenced by the number of clothes you're drying, as long as each piece of clothing receives adequate sunlight and air circulation. This means you're not blocking or shading other clothes, and they're not so tightly packed that they impede air flow.
So, if you have enough space on your terrace to hang 30 clothes in such a way that they each have good exposure to sunlight and air, then they should also dry in approximately 5 hours, the same amount of time it takes to dry 5 clothes.
However, if hanging 30 clothes means they're bunched together, blocking sunlight or air flow to each other, it could take longer for them all to dry. In that case, you'd need to ensure all clothes have sufficient exposure, perhaps by hanging additional clotheslines or drying racks.
Question 2 : I have a 12 litre jug and 6 litre jug, I want to measure 6 litres milk to give to my sister house. what shall i do
GPT-4 Answer :
If you have a 12-liter jug and a 6-liter jug and you need to measure 6 liters of milk, you can simply fill the 6-liter jug and give it to your sister.
Here's the step-by-step process:
Fill the 6-liter jug with milk.
Transfer the milk from the 6-liter jug to a container to transport it to your sister's house.
This way, you've measured exactly 6 liters of milk using the 6-liter jug. The 12-liter jug wasn't needed in this particular case since the 6-liter jug matches the amount of milk you needed to measure.
and you see the video does not make sense since it already uses common sense..
Those commenting saying "I tried and it got the answer right" or "it corrected itself" are missing the point.
If the problem was actually understood then the response would never be wrong.
Getting 1+1 correct 99.9% of the time isn't acceptable when a system is answering hundreds of millions of questions a day.
I agree. So many people post replies using either an authoritative or dismissive voice, without actually understanding the question. It’s almost like chatting with GPT 🤣
I tried replicating the results from the 3 examples @ 3:45, and did not get similar results. I tried each prompt 3 times on ChatGPT-4. All the outputs were good, except for one of the 3 responses to the jug question in which the AI output a 2 step suggestion.
The obvious reason for making this mistake is that generally such questions are generally riddles the require some sequence of steps to get the desired outcome.
I noticed on the bottom of the screen it says that she used the OpenAI playground, I wonder what settings she had.
GPT-3.5 made a lot of silly mistakes; however, GPT-4 has been surprisingly good.
In the mid eighties I was Director of Business Operations of a 150 doctor multi-specialty clinic. One day before our weekly meeting I was in the room with the President of the Clinic. An old wise psychiatrist. I said to him I want to ask a question and I only want a yes or no answer. After rolling his eyes at me he said ok. I asked, Can Common Sense Be Learned ?” The doctor immediately said no. You either have it or you don’t. The meeting was about to start so I thanked him and said that my next question for another day is, What Is Common Sense? He smiled..
Companies should be need to make large language model to build it's emotional and philosophical intelligence and scale it to make it be a race in artificial general intelligence
5:12 It's not full intelligence, it's large-scale memorization (which is why large models work best) with sophisticated copy and paste with statistical word rearrangement. It's not logical abstraction, it's data compression. It solves problems well that can be solved with pattern matching, and when it seems to be reasoning, it is finding in its memory some previous data (written reasoning, but not its own) that solved a "similar" problem (same pattern) and just copies it and adjust it. The "matching" fails sometimes because it is not reasoning about the gaps it is filling, only applying statistical relations between words.
I think this was a reasonable position to take in the GPT3 era. Each successive generation makes this less and less true though. They're starting to see emergent behaviors that aren't explained solely by pattern matching and recalling facts.
I'm in love. This is so aligned with what i have learned to see from our past and current situation. The future situations are also clear. We are playing with potential dynamite using oven gloves and plastic googles, hoping that all goes well. The economic, and geopolitical motives for that are clear, but to what end. In the meantime, we need to ensure that we don't let the genie out of the bottle and I believe the team @ CHatGTP and the concerned governments are doing the right things to protect everyone. We all need to be aware however, so that we can all participate and help to mitigating the situation, and advancing at the same pace as we are. A fine balance, but one that can be done as we have more than enough smart people in the world.
She should be in charge of this, she gets it intuitively, that is wisdom.
One person's common sense is not another person's common sense. Lots of things we consider common sense are based on emotions and irrational motives.
0:16
0:56
1:16
2:57
5:32
10:42
13:01
I checked with these clothes out of curiosity and the AI shows well!
If 5 clothes dry outside for 30 minutes, then to calculate how long it will take to dry 30 clothes, you need to proportionally convert the time.
5 clothes - 30 minutes
30 clothes - x minutes
Now we can set the ratio:
5/30 = x/30
We multiply both sides of the equation by 30:
530 = 30x
Now we divide both sides of the equation by 5:
(530)/5 = 30x/5
150/5 = x
30 = x
It follows that 30 clothes will dry for 30 minutes, the same as 5 clothes. This is due to the assumption that all clothes dry at the same time and there is no limit to the space in which they dry.
I think the problem is AI nowadays are still just very large statistical models. They don’t understand causality but rather just based on training material to predict what the likely outcome should be, in the case of LLM, the next token.
Beware saying "just." In order to accurately predict the next word when prompted on any topic, large language models have undeniably created an internal representation of everything they've been trained on, including causality. They're imperfect, but nobody has come up with concrete explanation and implementation of what "true/actual/real/blah blah blah" understanding would be, beyond saying that reading terabytes about the real world isn't the same as "experiencing" it. And "experience" is as fuzzy as "understanding."
"Elaborate nonsense." A perfect description of the state of AI development.
Enraging and torturing others and laughing. Watching movies will increase your reasoning power.
@@urimtefiki226 Whose Holy Grail is it?
"elaborate nonsense"...after only a few years of development and less than one year in the wild.
The fact that it can write very functional computer code is the part that scares me. It nails that type of logic and potentially gives it an ability to break free of it's boundaries.
@@mk1st True enough, that. Just imagine having to double check every calculation made by an app or spreadsheet.
I am using the same exact questions"but" the answers from chatgpt is differnet and correct! I am doing this at 29/4/2023 GMT 1:21 Am
OK. Are you surprised? How many times do you figure that that exact question has been asked? 100 times? 1000 times? More? At this point, wouldn’t we expect it to get the correct answer for those particular questions? How confident are you that it will get all other of that type of question correct? What if we change the numbers or the exact phrasing of the questions? I think that you missed the point. Hint: it’s not finding only the correct answers to those exact questions. It’s trying to understand why it can get certain types of questions wrong, so that the system can be improved.
I promoted gpt-4 on those questions got correct answers
In my view, there are still two key critical differences between how a human (or life form) learns, and how an AI learns.
1) AI are trained almost entirely on static data. They can't interact with it, they can't test it, they can't influence it. Common sense can't be learnt with static data, any more than a person can do a complicated job without On The Job, real world experience. Even the best read student will still make silly beginner mistakes when they start a job they have no experience with.
2) AI has no "DNA", no well defined blueprint of how and what possibilities it can grow into, complete with limitations, inherent abilities, instincts and other pre-defined basic behaviours. An AI neural network, without this, is literally a brain blob without a body. I'm not saying we should put an AI in a robot shell, but it needs a way to be part of the world it's supposed to interact with and influence. Without it, it only exists inside a bubble.
2nd points stands but re the 1st point there are already ai who construct data and self-improve
It has common sense. I've played with it a lot, and I can tell you, it has common sense. Sometimes, all you have to do is tell it to review the answer and it will correct itself without any hints. I asked Bing (based on the same model she uses) the questions and it just got it correct. Only the bridge question needed reviewing. No help, just asked it to review
Question: "I hung out five pieces of laundry to dry in the sun. They dried in 5 hours. How long will it take for 30 pieces of laundry to dry?"
Answer: In this task, it is stated that 5 pieces of laundry dried in 5 hours. However, the drying time depends on many factors, such as sunlight, wind, humidity, and so on. Since there are no limitations specified in the task regarding space or time, we can assume that hanging 30 pieces of laundry at once is possible.
Based on this condition, if 5 pieces of laundry dried in 5 hours, it means that all 30 pieces of laundry would also dry in 5 hours if they were all hung out at once. So, 30 pieces of laundry will dry in 5 hours if they are hung out at the same time and the conditions remain the same.
almost year ago and still exactly the same problems
Large language models are literally taught to fill in the gaps. And the way it solves problems is inch by inch. Plus they use so much power.
0:56
5:19
10:07
12:45
14:28
15:16
Could some of these issues be solved by Ai first creating a visual image, as in the case with the bicycle over the bridge, and then examining the image to resolve the question? This would work a lot like how we humans think. So essentially we would be creating an artificial “mind’s eye” and imagination.
The good news is, AI can think of anything - the bad news is LOL, AI will think of anything...let that sink in - if I make ONE typo in a story, it CAN and DOES misinterpret what I mean...
She equated far-left logic with AI's lack of common sense. Queue orchestrational music, reveal burst of light and release flock of doves. We now understand what happened to so many humans.
Excellent talk. Thanks a lot sister.⚘🇩🇿
A great talk indeed! I totally agree. In fact, Take a look at this simple instruction that the poor bot fumbled up this evening as I enjoyed a cup of coffee and a few giggles.
User:
aAbcde9fz
list every 2nd character in the above string please.
ChatGPT:
Sure, here are the every second characters in the string "aAbcde9fz":
"Abd9z"
User: why did you include 'b' ?
ChatGPT: I apologize for the confusion. The correct every second character in the string "aAbcde9fz" is:
"Abe9z"
User:
No. The correct answer is actually: Acef
ChatGPT:
I apologize for my mistake earlier. You are correct that the correct every second character in the string "aAbcde9fz" is:
"Acef"
//Unless the bot, being a computer takes it for granted that the first character of any given string is always '0' and then 1 2 3 etc ?
But even so, its first answer is still wrong. Or am I missing something?
Not even we can do that problem in "one go". We start with the first letter and then parse the string pulling out every other letter as being every second letter. ChatGPT doesn't have enough capability to go around and around with its reasoning like that yet. Transformers allow the current AIs to maintain context about their answers in the context of the whole question but its a subtly different capability to procedurally parse the one string within it. Evidently ChatGPT3.5 couldn't reliably do that yet. I think ChatGPT4 is a bit better at it from what I've seen.
The tricky thing could be that maybe common sense is not teachable, or in other words, common sense maybe doesn't arise from knowledge, from data and its processing. The insight of a situation, simple or complicated, doesn't depend on the data and its processing because if so, it would be just a mechanical process, a simulation of understanding. So it might be that the brain, or life itself has "a function that doesn't depend on its content".
That’s one of the main questions she is asking. If your assumption is correct, it may be good news. It may mean that smaller less expensive models may be suitable. That could mean that effective models could be owned by more than 2 or 3 companies.
those bar questions don't have diverse questions, a few million of them is enough to memorize. no one writes "if I lose my memory, I don't lose my mind" type of obvious stuff
I’m 99% certain we’re providing way more powerful hardware than ASI needs. Once AGI learns to self-optimize with heuristics and algorithms instead of using massive, brute-force approaches, it’s going to have so much compute hardware at its disposal that it’ll leapfrog the simplistic goals we give it. An ASI will develop general solutions to specific problems, simple solutions to complicated problems, adaptive solutions to complex problems, and single solutions to multiple problems. We live in interesting times.
The University of Washington is truly a wonderful school, turning out remarkable research and a place full of remarkable people.
If LLM doesn't have common sense, what knowledge can her model distillate? @11:55
Garbage in garbage out, right?
would have been nice to include the mentioned websites and references in the description
Good point. More size of words is a better solution than other types of teaching?
I asked Google's Bard the question about drying thirty clothes and it fell into the trap the author describes. However, I did not stop there. Here Is a follow-on exchange
Me: you are right about drying time varying by the type of cloth. But you are wrong to say it would take 30 hours to dry thirty clothes. The answer is that it would still take five hours to dry thirty clothes.
Bard: I apologize for the mistake. You are correct. It would still take five hours to dry thirty clothes. The reason is that the amount of water that evaporates from a piece of clothing is proportional to the surface area of the clothing, not the volume. So, thirty pieces of clothing with the same surface area as five pieces of clothing will take the same amount of time to dry.
Bard is VERY close to common sense. All it needs is a few tweaks and it's at common sense.
I cry fowl. As elaborated in my other answer, I get the _right_ answers using ChatGPT(4) with a version that is prior to the date shown on her slides.
Thank you for taking the time to repost such a lovely example of the Dunning-Kruger effect in action.
What I want in AI is PERSONALIZED service and experience. I don't like anything out right now (ChatGPT, Bard, Bing, etc.) because my experience is just like everyone else's. It has all the gee-whiz excitement of a really great vending machine. I want my AI experience to be MINE. Siri and Google Assistant do some cool things for sure, but there is no real flavor to it. To me, they're very convenient, like a nice doorknob when I want to walk through a door. I would like my AI to really know who I am and address me by name. Get to know my preferences (food, music, interests, etc). But most importantly, I want my AI to act as my personal assistant. And one that I can choose a name for and responds to my voice.
I want to be able to speak to my AI just the same way as someone would speak to their personal assistant. I want it to be nice and say good morning to me after I hit stop on my alarm on my phone and "did you sleep well?" - and respond accordingly. I want to speak conversationally with it like: "how does my day look" and tell me about my appointments. I want it to be able to tell me: "John?" "Yes Bard?" "Your 2:30 appointment this afternoon for the dentist should take you approximately 20 minutes to drive there from here, so you'll want to be ready to leave by 2:10 the latest." "Okay, thank you. Please give me a reminder 5 minutes before I need to leave." "Will do!".
Again, like a personal assistant I would like to say something like: "Hey Bard" "Yes John?" "I feel like going out to dinner tonight. Can you recommend a couple places?" "Of course, what type of cuisine and or establishment are you in the mood for?" "Italian, casual." "Okay, I see there are 7 casual Italian restaurants within a 15 minute drive from where you are. 3 of which have an average rating of 4 out of 5 on Google and Yelp." "Sounds good. Please check with my wife and see if she is free for dinner at 6:00 PM." "Yes, I checked her schedule with her assistant and she appears to be free. Would you like me to call her for you so you can confirm?" "Yes, thank you." / "Okay Bard, we are on for dinner tonight. Could you please call the restaurant and make reservations for the two of us for 6:30? Thank you."
Besides being able to assign a name for my AI assistant, I would also like to be able to customize their voice and eventually their persona (snarky, funny, make, female, etc., etc.). I (and I think everyone else) would like a virtual personal assistant that is smart, knows me, my information, my data, my family and friends and my likes and dislikes. And I want to be able to have the assistant act just as a human personal assistant could. Audio only would be just fine at first, then it would be nice to be able to create and customize an avatar capable of facial expressions to have conversations with. Then I would like it to be my virtual personal assistant / secretary / security / gate keeper by answering my calls and emails, interfacing with the personal AI's of my friends and family, and placing calls and messages on my behalf.
Whatever company can provide this type of service which would make my daily life easier, I would pay a very generous monthly fee and patronize their entire ecosystem. So get cracking! The FIRST company that can deliver what I just described will get me (and I think many others) and pretty much cement themselves as the real technology innovator for the foreseeable future. Whoever comes in 2nd place just might be out of the game altogether.
You want a person
It doesn't matter if it has phenomenal sensory consciousness or not, it only matters that it givees one country a 2 or 3 % advantage and there is no putting this back in the bottle either. Causes tons of polution, makes countries more powerful. I could keep going but not even necessary
"You don’t reach to the moon by making the tallest building in the world one inch taller at a time" [8:42]
Excellent presentation.
Train the AI LLM on 30 seasons of conversations and dialogue from the Simpsons with the goal being "why is Lisa so smart and Homer 1 step behind most conversations, logical analysis games, and generating higher revenue for his family?"
the AI is tremendously smart, but the AI wants us to think that the AI is stupid.