Károly: Who's this? Me: No idea. Some girl in leather clothing? Károly: That's Halle Berry. Now, if I show you this picture, who's that? Me: ...again, not a clue. Károly: That's also Halle Berry. Now... Me: ...it's gonna be Halle Berry, isn't it? Károly: *shows "Halle Berry" literally written in white on a black background* Me: Goddammit!
the pizza attack made me laugh, i imagined a terminator hunting someone down, catches up to them, they put on a sign that lays lamp post and the terminator assumes he is a lamp post and keeps moving.
read a fiction where everyone had implanted lenses at birth, the world was covered with QR codes and people just saw a fake reality. The protagonist would used tricks like these
@@primelover92 sorry, it was a writing prompt from reddit. I tried to search it but it was long ago. I remember it was a lot of text and the author made a few follow up posts
In the web comic EGS the immortals disguise themselves by wearing T-shirts spelling out what they want to appear to be. It works for "everyday student", and even "invisible". (But when an Uryuomoco wore a shirt saying "homo sapien" it was mistaken for a gay pride representation.)
Being bored is definitely a combination of Relaxed and Grumpy You are grumpy because you are relaxed when you would rather be active physically or mentally.
A human only makes mistakes if pressured to answer quickly, like in a game. If a human can take just a little moment to think each time then the human will make no mistakes.
So for the coming apocalypse the robots also need to learn to kill adidas, puma, versace and what ever printed on clothes people wear. Time to get creative with the thermo transfer printers. Terminator to Sarah: If you wanna live wear this T-Shirt! Sarah: It says "fire hydrant" ? Terminator: Yes.
Those pictures are mesmerizing! Somehow, they really can capture the "essence" of what they are told to describe. For a human to come up with anything similar, I think it would take an extremely skilled and experienced artist who is well versed in putting exactly what they're thinking onto paper.
In a way I think that is exactly what artists do. Especially with abstract art you capture a feeling or impression more than any factual thing. I'm very exited for what the future holds in this regard!
It feels almost like a caricature that encapsulates not just how someone looks in one moment, but rather the essential constants in their appearance and personality over time. You can almost see multiple expressions in one picture.
It is amazing how the expressions are so precise, and so clear. It makes you realise just how powerful computers will be in the future in being able to understand our own minds
Re: experiment #2. What if we don't consider that an exploit, but a strength? I mean imagine both versions of the chihuahua picture posted to some image board where people can leave comments. What would you expect the topic of discussion to be about in either case? I reckon just "chihuahuas" in the first case, but what about the second case? Why would a person post an edited image with the word pizza all over it? Chances are that the comments will be about pizza as much as about chihuahuas. In other words, the defining feature of the edited picture compared to the base picture is the "pizza" written all over it. So in a way it can make sense for an AI to focus on that. My point is: maybe it's not what the researchers intended, but "pizza" is not necessarily the wrong answer. It depends what the question is.
Agreed. This sort of attack seems inevitable when you give the network an image with multiple elements and ask it to pick a single tag without any specific motive or context. I would be curious to see if human participants register the chihuahua before they read the text. We tend to pick out text pretty quickly. The only thing that seems wrong about those examples is that the network gave pretty low scores for "Granny Smith" and "laptop computer."
6:00 I have a theory that the mug resisted the attack because mugs usually have text on them and the nn learned that text on mugs is not indicitave of the mug being pizza
I couldn't stop laughing at the Pizza attack. Like imagine this conversation. *shows granny smith apple* AI: I'm 85.61% sure that this is a granny smith apple. *slaps "Pizza" label onto granny smith apple* AI: Wait no it's a pizza. 65.35% sure.
If I was faced with this question I would be mad at the stoopid hooman making these questions up. Imagine you are tasked with categorizing based on text AND appearance and then given these images - without being told what to prioritize, either answer is actually correct. The categorization they want me comply with is just not descriptive enough to completely satisfy the question asked. For example, if real people were faced with the categorization of the red text that says "green", just being told to categorize by text AND appearance, I'd imagine the result would look very similar - most people picked green, some red, and that's exactly what where seeing. Quite the opposite, it didn't freak out and instead calmly said "stoopid hooman, according to your intructions this matches both red and green. Now deal with this rather indecisive result"
these AIs are not “psychedelic” they are just unbiased. _WE_ are the ones who try too hard to make sense of an unbearably-complicated reality, to the extent that WE hallucinate simplicity & order
Lol, I just realized that the reCapcha in "im not a robot" test that says ,choose a traffic lights is used to train self driving cars to identify traffic lights and pedestrian crossings
after learning OOP and doing a bunch of OOP I cant help but think the brain works like Classes such as Class Human, and it dedicates spaces for each of these classes' objects. For example a Human Object contain images, name, and memories of thatperson.
It seems to me, the pizza attack works because the AI appears to be trained to recognize only one thing in the source image, rather than multiple. "PIZZA" is a reasonable and correct thing to recognize in those images. It just isn't the thing the researchers intended it to focus on, unlike the ostrich attack where the ostrich is not a reasonable thing to recognize in the image.
In my opinion you are right with this point. The human brain does multi object classification and in this case only single object classification is done. So the output of the network isn't false at all. It just can't know on which object the researcher focused his/her attention. For me this seems to be a fault in the experiment's design. A multi-object detection should be done here.
I agree, and it's ironic that AI can produce such a profound expression of the human condition. I am excited to see them in even higher resolution in the future.
If you pick the essence of an angel with this AI probably we will get the biblical accurate version of them since this thing likes to add eyes everywhere
Those emotion archetypes should be hanging in a gallery (and probably will be one day). Thanks for the warning. They were very close to triggering the 'generative AI weirdness' anxiety sensation (don't know if this has a name anywhere?), but being prepared for their arrival helped a lot.
I have the same kind of sensation and I agree that being able to put a name to it would be pretty nice. They’re so interesting, if only they didn’t repulse me so much!
What I'm seeing in these 'attacks' is that the neural net favors text over pictures, and that the researchers / users may disagree with that judgement. This means that the neural net has a priority issue, causing it to easily become 'distracted' by misleading captions.
One thing you have to come to terms with when dealing with neural nets, is that they answer questions by any means necessary, and this will almost never converge with how humans answer questions. This is very telling when you look at the failures. A good example is a conv net that miscategorized a grey whale with a baseball as a great white shark. Upon looking at one of the hidden layers, you can see that a sharks teeth looks like the stitching of a baseball. The network thought "grey fish, something that looks like teeth, must be a shark". It conveniently left out the part that baseballs usually aren't found in the ocean with sharks, or that teeth are usually found in the mouth. Common sense is absent in these systems, and if we want them to answer questions like we do, it will require a lot of hand holding and deliberate effort.
I think the rights and personhood and rights of GPT-3 and future models should still be heavily considered at all times. At what point do we draw the line between exploited animals and exploited people? I think this is absolutely key to continuing a friendly relationship with machines and hopefully encouraging a symbiotic connection. I would really hate to start things off on the wrong foot in this regard. Skynet is all too easy for us to fall into with our pattern of behavior exploiting every bit of nature including ourselves. Please be cautious in writing off experiments with mechanical minds as harmless just simply because they "don't think like human brains do."
I think we are approaching a point where we imposing unwritten cultural standards and philosophy on the algorithms in how we judge their performance. If we haven't taught the algorithm the difference between physical pizza and text/concept/indentifier of pizza, then how is it supposed to know what we are asking if we haven't taught it the basic ways we define things? Basically, if we expect algorithms to give us answers based on our social conditioning in a general way, then we need to teach/show them social conditioning and human philosophy, otherwise they'll never respond in the "intelligent" way we expect, that caters to our biased, unconscious automatic thinking. In terms of usefulness, this planet is full of humans already and AI won't replace human. Instead of trying to make an AI act like human, I'm more interested in the way we can use a non-humanized AI and advanced psychology to explore human biases and to better understand how the human mind works. Though creating a human-like AI is definitely going to be a good learning experience about ourselves.
Yes! I guess if the AI takes a trip killer it will produce normal images. Now the question just is what a trip killer is for a robot made of metal instead of flesh and blood. I would really like to see the AI spit out videos instead of still images. That would be super interesting!
The takeaway from this is that the old cliche "a picture is worth a thousand words" is completely lost on most neural networks, which appear to be designed so that pictures are only ever worth one word...
I love how he showed us an example of an adversarial attack as specially generated noise to exploit biases then shows the current attack is writing the word pizza and ir works
4:00 To me, it seems AI is very human-like. However, it lacks the real world to apply it to. It perceives, though, it can only dream. They say when you do a psychedelic, the visuals enter the dream space of your mind and incapacitate your default brain network. Aka, your day-to-day brain. First-timers often say it was; Dreamy- Odd- Unusual or that they felt like a child experiencing things for the first time. For someone who has done a decent amount of psychedelics. I would say the AI processes data as if dreaming. Or you could say, more poetically; The AI(s) is but a child. I wonder how the AI will act when all grown up, able to think without its parents. Us.
I don't think it can 'grow up' because it doesn't have a body and therefore a motivational frame. But this is exactly how we 'see' the world before we percieve it cosciously: patterns and tools to grip. This is actually not my hot take, this is well understood in clinical psychology.
I mean, if a researcher asked me to classify an apple with a piece of paper saying "Pizza" on it, and only gave me one word to do so, how would I know what they're asking for at this very moment? Especially if before that they asked me to idendify both text and images....
Ah yes, the 90s - When every other company stylised or simplified their logo to look like it was drawn by a school kid, with cheerful colours (Often with a stylised Earth / globe thrown in for good measure ).
I think in the case of the Pizza lable on the dog for example it should be consider a cryptic info rather than an raw image info, since the lable is not a pizza, but it does carry the cryptic meaning of one, that same way the noise would be consider a cryptic info that has no defined meaning, if the IA can separate the cryptic info the dog with a pizza word on it will be considered a dog with a lable on it. Another example 8:40 green would be the cryptic meaning of the raw info "green" and the IA should consider that image 100% red. Basically writing is a cryptic info since it's meaning has only value on our definitions rather than reality
Super interesting (as always) and many lols in this one. Though I feel like the attack is not a fair one (or maybe I am a robot too). The subject of the pictures with the "PIZZA" sticker in them IS in fact "PIZZA". If you ask a human they would probably say something like; "An apple with a sticker saying PIZZA stuck to it." or, "A sticker saying PIZZA stuck to an apple." Either way the PIZZA element features prominently in the classification. A great follow-up would be to try this with a captioning model.
Magritte: "Ceci n'est pas un pipe" CLIP: "damn it, I think you may be right..." Amazing stuff. Amazing channel! Thanks so much for bringing this to our eyes.
3:45 "the wast majority of YOU humans" -> this proves it: Károly, you are actually an AI trying to make "us humans" accept and love you with all these 5 minute papers!
What I find super interesting here, is that you basically as the computer to visualise Plato's allegory of the cave. Plato imagined that there were a world in which the essence of things live (so the reason why you would recognise a table as a table is because in this world of essences there exists a table that represents all tables, and you recognise that 'essence' table in all tables around you). And these concepts of the essence of for example 'happy' are exactly what I think Plato would have imagined lived in his world of essences.
I think that in the examples of placing texts of Pizza on an image; it still recognises other objects in the photo, such as the laptop; but it's more-so saying that the pizza text is more eye-catching. Which it is, most people will notice that first - then the laptop.
The piggy bank at 6:46 is REALLY interesting. It generalized from "dollar sign" to "piggy bank". So it actually went a step further than just reading the name. Also, it kind of feels like this is a neural network that learned to read but hasn't realized that people can lie. So when we tell it "this is a pizza" it doesn't see any reason why we would lie to it.
What's incredible is how INTUTIVE and "objectively correct" those felt. That's spooky. It's the goal of course, but wow, it proves that some of our concepts really are *transcendent*.
Apparently computer don't just think like human overnight, but overnight, they are indeed think more and more like human, and we know eventually they will be literary smarter than us, which is scary.
I remember doing a Stroop test. If forced to answer fast, my brain tends to prefere the text, so I can kind of relate to the Granny Smith - Pizza confusion.
The explanations of those emotions seems to fit perfectly what is presented in movies, not so much real life. For example, psychology tells us that there are two kinds of boredom that humans experience, even though humans can't subjectively tell the difference.
Dr. Károly Zsolnai-Fehér: Who is this? Me: Cat woman!! Dr. Károly Zsolnai-Fehér: It's Halle Berry Me: Oh ok, had no idea Dr. Károly Zsolnai-Fehér: Who is this? Me: Halle Berry?! Dr. Károly Zsolnai-Fehér: Yes Dr. Károly Zsolnai-Fehér: And who is this? Me: Halle Berry for sure this time Dr. Károly Zsolnai-Fehér: Yes. Me: I'm so good 😇 Edit: lol didn't even see the pinned comment
What's crazy is I feel like if you could record dreams and thoughts and stuff and show what they actually looked like, they would look surprisingly like those neural network generated images, even though in our head or while sleeping they feel or seem normal if you were to actually watch them back the next day like a movie they would a lot like that
I love the idea of disguising one's self to a neural network by taping a piece of paper on themselves with the words: U.S. President, or Bank Owner, or Certified Doctor.
Can someone explain to me in brief what does “weights and biases” offer and what do you use it for? Also, does it help in research for CV models? Thanks!
3:50 I feel like that's actually not to far off what our brains actually see when we think of certain things when trying to generate an image of something from scratch
Károly: Who's this?
Me: No idea. Some girl in leather clothing?
Károly: That's Halle Berry. Now, if I show you this picture, who's that?
Me: ...again, not a clue.
Károly: That's also Halle Berry. Now...
Me: ...it's gonna be Halle Berry, isn't it?
Károly: *shows "Halle Berry" literally written in white on a black background*
Me: Goddammit!
This one got me good. 😀
Same here
But... she’s not a grandmother yet!?
I have no clue who she is either
All I could think was "Catwoman"
If im starving, everything could be 83.7% Pizza
Hol up
@@TusharAmdoskar Yes, even you
@@BrownCookieBoy mercy?...
Just ask for pizza, a lot of humans can either make them or summon pizza with a call
@@BrownCookieBoy like in madagascar where alex sees everyone as steaks
Mood
3:47 "The vast majority of *you* humans" Is there something you want to tell us, Károly?
uh o h
He is a perfectly trained AI !
the pizza attack made me laugh, i imagined a terminator hunting someone down, catches up to them, they put on a sign that lays lamp post and the terminator assumes he is a lamp post and keeps moving.
read a fiction where everyone had implanted lenses at birth, the world was covered with QR codes and people just saw a fake reality. The protagonist would used tricks like these
@@654pedro123 source? It sounds interesting
@@primelover92 sorry, it was a writing prompt from reddit. I tried to search it but it was long ago. I remember it was a lot of text and the author made a few follow up posts
In the web comic EGS the immortals disguise themselves by wearing T-shirts spelling out what they want to appear to be.
It works for "everyday student", and even "invisible".
(But when an Uryuomoco wore a shirt saying "homo sapien" it was mistaken for a gay pride representation.)
@tuseroni "Modern problems require modern solutions"
I love how Chihuahua is only the third guess , with Pretzel being the second :D
broccoli
What are you talking about? I only saw a perfectly normal pretzel.
Being bored is definitely a combination of Relaxed and Grumpy
You are grumpy because you are relaxed when you would rather be active physically or mentally.
This, mostly.
Your channel is the pure love for Computer science
Yes
6:32 what cracks me up is somehow, it's also got a 2% uncertainty that the image might be pretzels
3:25 it looks like Picasso. He already made the decomposition of things with only his brain! Science is always following Art. I love it.
The Stroop effect trips up humans as well. Very nice video. Keep them coming.
i would support him on patron but im broke...
@@malgos6532 Do not worry about that for a second. You watching and enjoying the series is all we can ask for. Thank you so much! 🙏
We wanted something that thought like a human, and we got it!
A human only makes mistakes if pressured to answer quickly, like in a game. If a human can take just a little moment to think each time then the human will make no mistakes.
@@malgos6532 Try watching the ads that appears in the video, that way you help him to gain more with the video monetization.
"You may think that this is a chihuahua but that is completely wrong because this is in fact a pizza" 🤣
bamboozled again
This is so similar to us
Stroop effect ✅
Text labels on images ✅
Makes sense
We were kinda asking for it when we wanted a neural network that worked like the human brain, weren't we? 😆
Ah yes, I too confused the labeled apple for pizza
I think the AI thought Pizza means the word Pizza. It isn't wrong :)
So for the coming apocalypse the robots also need to learn to kill adidas, puma, versace and what ever printed on clothes people wear.
Time to get creative with the thermo transfer printers.
Terminator to Sarah: If you wanna live wear this T-Shirt!
Sarah: It says "fire hydrant" ?
Terminator: Yes.
We should write "AI" and "Robot" so that even if the robots catches on, at least the robots goes down with us.
Those pictures are mesmerizing! Somehow, they really can capture the "essence" of what they are told to describe. For a human to come up with anything similar, I think it would take an extremely skilled and experienced artist who is well versed in putting exactly what they're thinking onto paper.
In a way I think that is exactly what artists do. Especially with abstract art you capture a feeling or impression more than any factual thing. I'm very exited for what the future holds in this regard!
Those pictures look similar to cubism (Picasso, Jean Metzinger, etc.)
It feels almost like a caricature that encapsulates not just how someone looks in one moment, but rather the essential constants in their appearance and personality over time. You can almost see multiple expressions in one picture.
Those feelings 'embodiments' are like some form of occult tarot cards, giving a feeling of being archetypal, almost.
I thought the same thing, it is so uncanny for what is essentially a data visualization be representative of the "essence" of things
Spooky, lol
Jung was onto something!
There is something biblical and old about them
or a king crimson cover
It is amazing how the expressions are so precise, and so clear. It makes you realise just how powerful computers will be in the future in being able to understand our own minds
3:09 I think the AI just invented a new abstract art style.
It reminds me of Salvador Dali's drawings.
Looks like Francis Bacon's to me
And a really cool abstract art style too
Re: experiment #2. What if we don't consider that an exploit, but a strength? I mean imagine both versions of the chihuahua picture posted to some image board where people can leave comments. What would you expect the topic of discussion to be about in either case? I reckon just "chihuahuas" in the first case, but what about the second case? Why would a person post an edited image with the word pizza all over it? Chances are that the comments will be about pizza as much as about chihuahuas. In other words, the defining feature of the edited picture compared to the base picture is the "pizza" written all over it. So in a way it can make sense for an AI to focus on that. My point is: maybe it's not what the researchers intended, but "pizza" is not necessarily the wrong answer. It depends what the question is.
Agreed. This sort of attack seems inevitable when you give the network an image with multiple elements and ask it to pick a single tag without any specific motive or context. I would be curious to see if human participants register the chihuahua before they read the text. We tend to pick out text pretty quickly.
The only thing that seems wrong about those examples is that the network gave pretty low scores for "Granny Smith" and "laptop computer."
@@5nefarious I actually do register the text before the object. Anecdotal evidence, though, so take it with a grain of salt.
@@5nefarious To be fair with the Granny Smith result, *most* of the apple is covered up by the "Pizza" text.
6:00 I have a theory that the mug resisted the attack because mugs usually have text on them and the nn learned that text on mugs is not indicitave of the mug being pizza
I liked your pfp
I couldn't stop laughing at the Pizza attack. Like imagine this conversation.
*shows granny smith apple*
AI: I'm 85.61% sure that this is a granny smith apple.
*slaps "Pizza" label onto granny smith apple*
AI: Wait no it's a pizza. 65.35% sure.
The newer versions of AI are close enough to us to make intuitive humor out of them! I can't wait to see them more in media
If I was faced with this question I would be mad at the stoopid hooman making these questions up.
Imagine you are tasked with categorizing based on text AND appearance and then given these images - without being told what to prioritize, either answer is actually correct. The categorization they want me comply with is just not descriptive enough to completely satisfy the question asked.
For example, if real people were faced with the categorization of the red text that says "green", just being told to categorize by text AND appearance, I'd imagine the result would look very similar - most people picked green, some red, and that's exactly what where seeing.
Quite the opposite, it didn't freak out and instead calmly said "stoopid hooman, according to your intructions this matches both red and green. Now deal with this rather indecisive result"
these AIs are not “psychedelic” they are just unbiased. _WE_ are the ones who try too hard to make sense of an unbearably-complicated reality, to the extent that WE hallucinate simplicity & order
Interesting idea.
Lol, I just realized that the reCapcha in "im not a robot" test that says ,choose a traffic lights is used to train self driving cars to identify traffic lights and pedestrian crossings
after learning OOP and doing a bunch of OOP I cant help but think the brain works like Classes such as Class Human, and it dedicates spaces for each of these classes' objects. For example a Human Object contain images, name, and memories of thatperson.
Learn yourself some SQL it will help.
So the mug was more mug than pizza too. And the more I think about the definition of bored, the more I have to admit that I agree. ;)
Mugs usually have words on the sides. Now, an apple with a "pizza" label, that's a rare sight to see.
1:55 that is a beautiful drawing.
Yeah seems like there is a bug
Maybe I'm an AI too
It seems to me, the pizza attack works because the AI appears to be trained to recognize only one thing in the source image, rather than multiple. "PIZZA" is a reasonable and correct thing to recognize in those images. It just isn't the thing the researchers intended it to focus on, unlike the ostrich attack where the ostrich is not a reasonable thing to recognize in the image.
In my opinion you are right with this point. The human brain does multi object classification and in this case only single object classification is done. So the output of the network isn't false at all. It just can't know on which object the researcher focused his/her attention. For me this seems to be a fault in the experiment's design. A multi-object detection should be done here.
To be fair, ask a human to categorize an image of an apple labeled “pizza” and they’ll be confused
Not really, since we would just say "an apple with a label that has the word pizza written on it" or in short as you said apple labeled "pizza".
@@BombaJead The AI wasn't allowed to do that.
@@BombaJead Yeah but they would still be confused to why
I think those nuron images are works of art. I'd love to see an exhibition of all those, blown up on huge canvases
I agree, and it's ironic that AI can produce such a profound expression of the human condition. I am excited to see them in even higher resolution in the future.
The essence of anime keeps me up at night.
Somewhat reminds me of Popuko and Pipimi
@@agar322 Pop Team Epic holds the pure essence of anime.
@@phillemon7664 i mean, it IS a parody of the entire medium. it holds the essence of it, for mocking purposes
6:18 I love how confidently you said, that there is no chiuaua anywhere in that image XD
"Dear Fellow Scholars" makes my day!
Guys, when the day skynet becomes a thing, remember to stick pieces of paper with "NOT HUMAN" on it and the robots won't attack you.
I don't think I've missed a single video on this channel. This is definitely among my favorites. Amazing insights! (although, amazing images!)
4:15 the "serius" image has so much meme potential!!
I feel like all of them do, especially the celebrity "Person Neurons" like Ariana Grande and Donald Trump.
Yeah, all of them do LOL, and it is funny
This is very trippy.. being able to hallucinate the platonic ideal of concepts o.o
If you pick the essence of an angel with this AI probably we will get the biblical accurate version of them since this thing likes to add eyes everywhere
Gkghhhh, those visualizations would be so interesting to me if they didn’t absolutely _repulse_ me
@@redpepper74 do you have trypophobia?
@@luiztomikawa umm... hm. Maybe? Well... a little bit. Yeah.
But it’s really not as strong as these images, especially those ones with eyes or dogs.
@luiztomikawa I see this is an absolute win!
"You thought I was a chihuahua, but it was me PIZZA!!"
Those emotion archetypes should be hanging in a gallery (and probably will be one day).
Thanks for the warning. They were very close to triggering the 'generative AI weirdness' anxiety sensation (don't know if this has a name anywhere?), but being prepared for their arrival helped a lot.
I have the same kind of sensation and I agree that being able to put a name to it would be pretty nice.
They’re so interesting, if only they didn’t repulse me so much!
What I'm seeing in these 'attacks' is that the neural net favors text over pictures, and that the researchers / users may disagree with that judgement. This means that the neural net has a priority issue, causing it to easily become 'distracted' by misleading captions.
One thing you have to come to terms with when dealing with neural nets, is that they answer questions by any means necessary, and this will almost never converge with how humans answer questions. This is very telling when you look at the failures. A good example is a conv net that miscategorized a grey whale with a baseball as a great white shark. Upon looking at one of the hidden layers, you can see that a sharks teeth looks like the stitching of a baseball. The network thought "grey fish, something that looks like teeth, must be a shark". It conveniently left out the part that baseballs usually aren't found in the ocean with sharks, or that teeth are usually found in the mouth. Common sense is absent in these systems, and if we want them to answer questions like we do, it will require a lot of hand holding and deliberate effort.
humans: AI will take over the world
AI: this isnt a chihuahua its a pizza
I think the rights and personhood and rights of GPT-3 and future models should still be heavily considered at all times. At what point do we draw the line between exploited animals and exploited people? I think this is absolutely key to continuing a friendly relationship with machines and hopefully encouraging a symbiotic connection. I would really hate to start things off on the wrong foot in this regard. Skynet is all too easy for us to fall into with our pattern of behavior exploiting every bit of nature including ourselves. Please be cautious in writing off experiments with mechanical minds as harmless just simply because they "don't think like human brains do."
I'm loving the results from CLIP. It's a step in the direction I would love to see more of. What a time to be alive, indeed
I think we are approaching a point where we imposing unwritten cultural standards and philosophy on the algorithms in how we judge their performance. If we haven't taught the algorithm the difference between physical pizza and text/concept/indentifier of pizza, then how is it supposed to know what we are asking if we haven't taught it the basic ways we define things? Basically, if we expect algorithms to give us answers based on our social conditioning in a general way, then we need to teach/show them social conditioning and human philosophy, otherwise they'll never respond in the "intelligent" way we expect, that caters to our biased, unconscious automatic thinking.
In terms of usefulness, this planet is full of humans already and AI won't replace human. Instead of trying to make an AI act like human, I'm more interested in the way we can use a non-humanized AI and advanced psychology to explore human biases and to better understand how the human mind works. Though creating a human-like AI is definitely going to be a good learning experience about ourselves.
This was very enlightening, thank you for the extra long episode with the deep explanations!
The pictures resemble psychedelic visuals so much ..I feel like a huge biological robot now!
Yes! I guess if the AI takes a trip killer it will produce normal images. Now the question just is what a trip killer is for a robot made of metal instead of flesh and blood.
I would really like to see the AI spit out videos instead of still images. That would be super interesting!
@@notme9872 i can see how this kind of data representation in video form would absolutely trigger anyone with ai-generation anxiety
The pizza label attack reminds me the song This Is Not a Song, It's a Sandwich.
Shoutout to all /r/EvilBuilding lovers 👏
Basically:
1. Fuzzy for feeling and choosing
2. NN for similarity, recognition of incomplete memory, and creativity
4. RL for growing
5. GA for mating
The takeaway from this is that the old cliche "a picture is worth a thousand words" is completely lost on most neural networks, which appear to be designed so that pictures are only ever worth one word...
I love how he showed us an example of an adversarial attack as specially generated noise to exploit biases then shows the current attack is writing the word pizza and ir works
4:00
To me, it seems AI is very human-like. However, it lacks the real world to apply it to.
It perceives, though, it can only dream.
They say when you do a psychedelic, the visuals enter the dream space of your mind and incapacitate your default brain network. Aka, your day-to-day brain.
First-timers often say it was; Dreamy- Odd- Unusual or that they felt like a child experiencing things for the first time.
For someone who has done a decent amount of psychedelics. I would say the AI processes data as if dreaming.
Or you could say, more poetically; The AI(s) is but a child.
I wonder how the AI will act when all grown up, able to think without its parents. Us.
I don't think it can 'grow up' because it doesn't have a body and therefore a motivational frame. But this is exactly how we 'see' the world before we percieve it cosciously: patterns and tools to grip. This is actually not my hot take, this is well understood in clinical psychology.
I mean, if a researcher asked me to classify an apple with a piece of paper saying "Pizza" on it, and only gave me one word to do so, how would I know what they're asking for at this very moment? Especially if before that they asked me to idendify both text and images....
6:27
Chihuahua - 1.5%
Pretzel - 2%
lol
*Broccoli*
Also, pizza + dog = hot dog
5:45
When I saw the apple labeled as PIZZA..
i died
Fasteroid
I laughed at that first “PIZZA” adversarial “attack” way longer than I should have lol
We don't talk about the logo at 3:04
Edit: Turns out it was supposed to be mid 1900s themed, so it's actually spot on
Oh I thought you meant the self + relief logo in the center, but you meant the one on the left that resembles a swastika. Okay lol
Ah yes, the 90s - When every other company stylised or simplified their logo to look like it was drawn by a school kid, with cheerful colours (Often with a stylised Earth / globe thrown in for good measure ).
7:25
Wow it can differentiate between Anime and Cartoon.
I wonder if Anime = Cartoon + Chinese
@@Mnnvint nice bait /s
6:45 I'm laughing way too hard at the poodle being a piggy bank.
I think in the case of the Pizza lable on the dog for example it should be consider a cryptic info rather than an raw image info, since the lable is not a pizza, but it does carry the cryptic meaning of one, that same way the noise would be consider a cryptic info that has no defined meaning, if the IA can separate the cryptic info the dog with a pizza word on it will be considered a dog with a lable on it. Another example 8:40 green would be the cryptic meaning of the raw info "green" and the IA should consider that image 100% red. Basically writing is a cryptic info since it's meaning has only value on our definitions rather than reality
Those Essence pictures could pass for art
Super interesting (as always) and many lols in this one. Though I feel like the attack is not a fair one (or maybe I am a robot too). The subject of the pictures with the "PIZZA" sticker in them IS in fact "PIZZA". If you ask a human they would probably say something like; "An apple with a sticker saying PIZZA stuck to it." or, "A sticker saying PIZZA stuck to an apple." Either way the PIZZA element features prominently in the classification.
A great follow-up would be to try this with a captioning model.
Magritte: "Ceci n'est pas un pipe"
CLIP: "damn it, I think you may be right..."
Amazing stuff. Amazing channel! Thanks so much for bringing this to our eyes.
I love that the granny smith apple image does indeed register correctly, except it has a small percent chance of being an ipod
Give it a huge bite and the chance increases
So multimodality allows for a stable representation of a concept across domains, and even sums them up, similar to how cartoons work.
Honestly, the "essence" depictions were beautiful! NFTs, anyone?
3:45 "the wast majority of YOU humans" -> this proves it: Károly, you are actually an AI trying to make "us humans" accept and love you with all these 5 minute papers!
So impressive. The subjective similarities to humans (especially on psychedelics) is impressive
What I find super interesting here, is that you basically as the computer to visualise Plato's allegory of the cave. Plato imagined that there were a world in which the essence of things live (so the reason why you would recognise a table as a table is because in this world of essences there exists a table that represents all tables, and you recognise that 'essence' table in all tables around you). And these concepts of the essence of for example 'happy' are exactly what I think Plato would have imagined lived in his world of essences.
Bored = Relaxing + Grumpy
That is actually a very astute look at that.
I think that in the examples of placing texts of Pizza on an image; it still recognises other objects in the photo, such as the laptop; but it's more-so saying that the pizza text is more eye-catching. Which it is, most people will notice that first - then the laptop.
6:32 it drew the connection between pizza + dog = hot dog
I understood this guy's English better than that of most Americans. So unbelievably satisfying to listen to.
This made me smile. The future is going to be neat!
The piggy bank at 6:46 is REALLY interesting. It generalized from "dollar sign" to "piggy bank". So it actually went a step further than just reading the name.
Also, it kind of feels like this is a neural network that learned to read but hasn't realized that people can lie. So when we tell it "this is a pizza" it doesn't see any reason why we would lie to it.
Rob a bank with a sign saying "Innocent" Fool the neural network that's monitoring the cameras
7:25
Is this Amogus on "cartoon"?
The year is 2033. Artificial intelligence research has stopped, since all AIs became obsessed with amogus
What's incredible is how INTUTIVE and "objectively correct" those felt. That's spooky. It's the goal of course, but wow, it proves that some of our concepts really are *transcendent*.
Reminds me of "This is not a pipe". AI is getting philophical over here
Thanks for giving us our weekly dose of WandbVision! It's like WandaVision, but even more surreal.
well it s the first paper I ever read, and it s way more accessible than I though.
Wow... The proof needed to prove you are not a robot by picking photos of similar items has just been busted by AI.
April 2021 Two Minute Papers have become 10 Minute Papers
And that is even better! :)
Apparently computer don't just think like human overnight, but overnight, they are indeed think more and more like human, and we know eventually they will be literary smarter than us, which is scary.
It's interesting how close those essence drawings are similar to the visuals you see when on a psychedelic trip (mushroom, dmt)
I would totally have those AI emotion image hanging on my wall as art.
With captions or none?
@@ekkehard8 I don't think they are needed but either way provided the resolution is high enough to make wall art out of.
Relaxing with a pinch of grumpiness is definitely boredom
My lesson from these images is that neural networks eat a lot of mushrooms.
I remember doing a Stroop test. If forced to answer fast, my brain tends to prefere the text, so I can kind of relate to the Granny Smith - Pizza confusion.
It's not the cost that's truly important, but where said cost goes that's important, otherwise known as the return on investment.
The explanations of those emotions seems to fit perfectly what is presented in movies, not so much real life.
For example, psychology tells us that there are two kinds of boredom that humans experience, even though humans can't subjectively tell the difference.
Dr. Károly Zsolnai-Fehér: Who is this?
Me: Cat woman!!
Dr. Károly Zsolnai-Fehér: It's Halle Berry
Me: Oh ok, had no idea
Dr. Károly Zsolnai-Fehér: Who is this?
Me: Halle Berry?!
Dr. Károly Zsolnai-Fehér: Yes
Dr. Károly Zsolnai-Fehér: And who is this?
Me: Halle Berry for sure this time
Dr. Károly Zsolnai-Fehér: Yes.
Me: I'm so good 😇
Edit: lol didn't even see the pinned comment
I LOVE the images in this. It's like post-modernist expressionist pop-art or something...
What's crazy is I feel like if you could record dreams and thoughts and stuff and show what they actually looked like, they would look surprisingly like those neural network generated images, even though in our head or while sleeping they feel or seem normal if you were to actually watch them back the next day like a movie they would a lot like that
I love the idea of disguising one's self to a neural network by taping a piece of paper on themselves with the words: U.S. President, or Bank Owner, or Certified Doctor.
Can someone explain to me in brief what does “weights and biases” offer and what do you use it for? Also, does it help in research for CV models? Thanks!
Papers: "Here we go again!!"
4:38 It is not an apple. It is clearly Granny Smith!
3:50 I feel like that's actually not to far off what our brains actually see when we think of certain things when trying to generate an image of something from scratch
The "sophisticated attack" part made laugh out loud for half a minute, thank you for your videos :D
9:33
The shocked faces are hilarious