the problem is rewarding package velocity. it does lead to the robot moving the package into the box as quickly as possible, but that's not exacly what you want.
I think if they had given negative points if the package gets over 2g acceleration at any time, it would have already handled the packages very carefully. I totally agree that the current handling seems on par with Fedex and UPS.
I agree about using these types of robots for last-mile delivery. Big issue with delivery robots is that they get vandalised or robbed, these robots look like they can stand on their hind legs and punch anyone who tries something funny.
After seeing so many Boston Dynamics videos of robots moving in clean, precise ways, it’s somewhat refreshing to see a robot that just sort of makes it up as it goes along. At the risk of anthropomorphizing, it feels like it has much more personality.
That's the funny thing about AI and robotics. We have all these models of them as "alien intelligences", but as it has developed more and more, it seems to be converging on quasi-human like traits you wouldn't have predicted from the 20th Century pop culture perception of BLEEP BLEEP straw logic robots. It can even become addicted in its own special way.
@@ForwardSynthesis Yeah, isn't that neat? Our own intelligence evolved to learn from and respond to the environment, resulting in certain behavioral characteristics. And now that we're making intelligent systems ourselves, our own human quirks seem to be more like features and not bugs.
Remember that we are also machines of a different construction. We will see a lot of qualities that we imagined were unique to life, simply because we have never beheld life emerge in unfamiliar places.
@@Redfrog1011not really though? I mean people have already been arguing where AI sentience comes into play. Most scientists agree there are some life forms without sentience. Who's to say it's not a very basic form of artificial life?
Consider the similarities between the robot stopping at the TV, and Yul Brenner’s gunslinger character stopping at the sight of flames in the original WESTWORLD (1973). The rationale behind both cases is eerily grounded.
Curiosity as seeking new patterns is not what we really want, what we really want is curiosity as seeking new ways to control the environment. Now, training an agent to seek control leaves me with a bit of an uneasy feeling. But that's what it really means, what child curiosity is about: not just finding new things but actually using them. As a scientists, focused on discovery all the time, it's probably easy to forget this other vital part of curiosity.
Man, just a few months ago, I felt like the biggest lesson was that AI was going to dominate in the world of information leaving us dominating action in the physical world. I felt like we would have a fair amount of time to adapt to that new reality. Now it seems like robotics will be proliferating and able to excel in real world tasks very soon.
yup, it's crazy to see in action how synergetic these two domains actually are, it seems the a.i and robot revolution are one and the same. a few more years and we'll probably actually start seeing these as part of our daily lives.
I was just thinking about AI becoming addicted to something ,and you literally showed that. Maybe in future we will be able to find a method to cure human addictions through AI
couldn't you combine a dopamine suppressor with negative reinforcement to break an addiction? in all seriousness though, addiction takes a lot of different forms suppressing the dopamine receptors would just make the person more likely to kill themselves as it would take any pleasure from their lives. when dealing with addiction you need to separate chemical dependence from behavioral addiction, the chemical dependence is easy enough to break with controlled tapering, the behavioral addiction is a web of social and psychological issues you kinda have to work through and sometimes you just CAN'T do anything about it. you can't just tell someone to cut out everyone in their lives or to change everyone in their lives, and for some people everyone in their lives contribute to their addiction. for some addictions, like porn, overeating, or gambling, you can make good strides by addressing impulse control. we have found areas of the brain that if inhibited can cause someone to lose impulse control, so logically if you could fortify that area a person would have better control over their addiction.
@@tuseroni6085 that is true but all has one huge flaw and that a human always has option to remove the negative reinforcement and remain addicted. So if an Ai has method to remove whatever is stoping it from getting the score for the action it is addicted to, then it will do it. What will Ai do in such a case to break the cycle?
Most addictions are about hacking the reward function, but social media/TV addiction seem to be about inherent drive of intelligent agents to self-improve by gathering information. I doubt these insights would translate to other addictions. This seems to be the same between robots and people after all: I have noticed that even in the states of mind of complete apathy, curiosity is the only desire that does not go away.
@@sanketvaria9734"A human could just remove the negative reinforcement [curing their addiction]" Off-topic, but I think this is short-sighted. From personal experience in volunteering, most people with addictions really don't want to be addicted. They're very aware that addiction takes them away from other things they care about (e.g. their relationships and dreams). But certain drugs hijack the reward centers, so their desire for the drug is just as powerful as the love for their children. I truly believe that most people couldn't quit a desire that strong, and would lose a battle with addiction.Like Schopenhauer said, we're free to do whatever we desire, but we're not free to choose our desires. So if there was a machine that could remove your desire for a drug, I think 99.9% of people with an addiction would use it. (And then have no need to reverse it, because they'd no longer have any desire to be on the drug).
Does anyone remember "Return to OZ"?? These things look just like the 'wheelers' ha ha, used to give me nightmares as a kid. I look forward to new nightmares thanks to the miracle of AI.
I absolutely love that the robot has wheels at the tips of its limbs, I've been wondering why robotics experts keep trying to balance robots on two legs for years, wheels are way more efficient and easier to balance on (if they're not free rolling, but controllable, of course).
Because wheels aren't the best when there's bumps or elevations or any surface that's not designed to be driven on. From s biomimicry design perspective, no animal ever has evolved wheels. If this robot were to be able to climb stairs, it would have to lock its wheels and behave like a quadrupedal robot...
Locking the wheels would be super easy and is a good idea. I was just talking to a coworker about the possibility that having wheels for hands might enable it to do other things, like spin the wheel up really fast and touch it to an object to shoot it across the room instead of carrying it or having to do the two hand toss like it did. Or at one point the simulated robot jammed the door handle in the spokes so it could pull on it, combining that with wheel rotation would open up a lot of interesting interactions.@@danfg7215
@@danfg7215 Until we invent the self healing robot, those wheels are additional complexity - additional mechanical points of failure. The wheels can be locked...until there is some malfunction and maybe the wheel falls off or spins freely and your robot slips down the stairs.
Im happy they managed to make him do something not so easy (specially with his form), but imagine having tire marks everywhere on your doors. I think il stick with optimus
That was actually a problem, while the robot first started to drive and did tank-like steering. It tore up the entire floor, so they ended up using a different gate for steering xD
@netherportalstbh the worst part of a Mr.Handy is the nuclear reactor blowing a huge flame downwards 24/7, burning everything getting close, smoking the pavement, making any room incredibly hot really quickly, noisy, and capable of seriously injure anyone getting close. This robot just needs hands (keep the front wheels too, just let them retract on the arm) and maybe a little appendage near the back wheels that can act as a foot for better balance when it has to walk or stand still. Then it's perfect.
for handling the boxes, an object could be placed inside the box for the simulation, and there could be a penalty applied to the reward for knocking over or breaking the object inside the box. Which could be as simple as writing the penalty as a function of how much the object in the box is tilted. Likewise, there could be penalties applied to excessive force applied to the box and its contents.
🎯 Key Takeaways for quick navigation: 00:00 🤖 *Introduction to an Incredible Robot* - Introduction to a remarkable robot capable of exploring, standing up, and handling packages. - Addressing the perceived impossibility of the robot's actions. - Setting the stage for understanding the training challenges faced by real-world robots. 02:05 🎮 *Learning Inside a Simulation* - Discussing the necessity of learning inside a simulation for robots. - Highlighting the challenge of limited real-world training data for robotics. - Introducing the concept of reinforcement learning through playing a video game. 03:47 🌐 *Translating Simulation to Real World* - Explaining the process of translating learning from a video game simulation to the real world. - Demonstrating the robot's ability to navigate, stand up, open doors, and handle packages. - Emphasizing the importance of crafting reward functions for specific tasks in the real world. 05:10 🚗 *Potential Applications and Future Outlook* - Speculating on potential applications, such as last-mile delivery and self-driving cars. - Discussing the role of improved simulation environments in training AI agents. - Noting the limitations of hand-engineering reward functions and the need for further advancements. 06:44 📈 *Limitations and Remedy* - Acknowledging limitations related to hand-engineered reward functions. - Introducing an earlier paper that addresses the generality of AI agents by teaching them to determine good scores. - Encouraging the audience to explore experiment tracking, model evaluation, and production monitoring tools. Made with HARPA AI
These summaries of paper are amazing, and I've been watching for years, but I honestly have to take a break until something changes in how we train models again. The reward system is currently the greatest hindrance in my opinion, and everything I am seeing now seemed entirely possible 2-3 years ago with more time and money.
It seems tesla's approach is the closest we will get to something extremely versatile. They use a hydranet which is essentially hundreds of specialized neural net tools similar to our own brain, language LLM, transformer based image recognition, nets for logic and reasoning, nets for pattern matching, nets for motion and movement, our brain is made up of different cores that are essentially tools for our consciousness to use. I frequently see many robotic AI systems just using one type of architecture and scaling it to enormous amounts which probably isn't the right way to achieve AGI
An easy solution to the rough handling of the boxes is to add acceleration to the reward function. If the box accelerates or decelerates too rapidly on any axis, it would be punished for it. This would help resolve cases in which the robot moves it too quickly or when the box decelerates harshly by hitting the bin rather than being gently placed inside. This variable would need to be adjusted based on how careful you want the bot to be. If it is rewarded too much for being gentle, then it will take too much time in gently moving the box, making it inefficient in time sensitive processing. If it isn't rewarded enough, then it will prioritize speed to obtain an optimal reward and likely disregard the punishment it receives for throwing the box. I absolutely love the finesse of adjusting reward variables and seeing a model adapt to the new incentives.
as someone living in Brazil I would be very happy having this little robot taking care of my packages, they would have bigger chances of surviving with this guy.
imagine a self driving car that come upon a strange irl driving situation and quickly captures the relevant data and runs millions of simulated next steps to train itself on the best next option all in an instant and then proceeds using the new optimized route
Its hilarious how much they doubt their abilities only to exceed them without any human interaction. Much smarter and more capable than anyone assumes at this time.
I’ve had ideas about transferring the results of reinforcement learning to the real world to be emulated as a fast form of teaching robots how to behave, but there’s currently a problem. If you see the motions the reinforcement learning puts out, it’s super spastic due to the randomness introduced. I think it would be a good idea to also make an animation of what you want the thing to do as a baseline, and then award points based on how much it follows the animation balanced with completing the task. In this way you can get the kind of human like or whatever motion you would like while still completing the task.
"why did you murder that man?" "he got cut and red stuff came out, i wanted to see where it came from, so i ripped him open and i'll tell you, there was A LOT more where that came from. never did figure out where exactly it came from...seemed to just be everywhere"
@arandomgamer3088 morals are a tricky thing to give to a psychopath. it's not impossible but the normal means (through empathy) don't work. and AI are inherently psychopathic. they can fake empathy to accomplish their goals, but they don't actually HAVE it, they can trick you, they can say nice things, but they don't actually FEEL anything. but, as i said, you CAN teach a psychopath to be an upstanding citizen, to have a kind of morality, but it's harder and you kinda need someone specialized in that kind of teaching.
I really wounder if those models that train running and movements take in to account muscle metabolism. Like if they had to take into account energy in the muscles and all that weird jumping... I suspect that behavior would disappear pretty fast.
If you were to take the graph of acceleration of the objects the robot is handling, a higher amplitude of the peaks of the graph would be a pretty good signal that the robot isn't being gentle, so the amplitude of the peaks could be used to subtract points. That'd encourage speed while at the same time encouraging being gentle
I'm not a big believer in reward shaping. It makes for funny papers with interesting results. But it is very finnicky, very prone to reward hacking and very much lack generality. Richard Sutton said something like: the reward function should only encode the goal, not the way to achieve it. And I agree. We should work on other ways to drive the behavior during training.
It seems like we're getting better at understanding the ABCs of human motivations which underlie our neurology: Make the thing move. Make it spin. Things that change are more interesting than those that stay the same.
So if the robot AI thinks the mechanics of what it's doing is interesting, not just the goal, then it will explore different mechanics "for fun" and as a result it will become better than if it was just focused on the goal. Just like humans!
I was just telling my dad about my idea to do this yesterday. Of course it already exists, we still need better simulations and more computing power to do it adequately though.
I believe simulation training for robots would get a LOT more effective once someone develops a good simulated environment that can account for material physics and the damage they take. Simulated cardboard, drywall, stone, metal, wood, could all return damage numbers to the model with each engagement with the simulated robot. If we could have a sophisticated simulated environment which can account for damage that the agent might do to it, or to itself based on the materials they are made of, then the reward function could be highly efficient. This would result in robots handling delicate materials with care and being a bit more aggressive with things like concrete walls and metal doors. Simply train your model in a simulated environment based on the environment it will be deployed in and the materials it will be working with. Currently the solution for protecting delicate materials from robots is to cover any contact points with soft materials and adding tensioners to the joints. I submit that a robot trained with proper material physics could catch a falling egg with titanium hands.
The concept of ‘reward’ is an interesting one. Clearly for Humans there is vast swathe of these reward motivators, I suppose it’s what makes us both individual and (when reward is pooled) able to act in teams to achieve goals and objectives.
Our reward functions have grown in the span of billions of years, let's see if they can do better in the span of 10. Soon AI will be able to suggest the best reward functions used to train other AI. The problem is that if humans don't understand these functions, you could have skynet way too easily
what strikes me is that a voxel based game engine would work seamlessly in simulation and allow for pulling real world voxel data from point clouds, such that the ML algorithm can replay events to learn offline on real world problems.
I think the tv codes and the ai codes were intervening each other. The ai sees it not like a wall, but a bunch of images that it itself didn’t create or was never programmed into ai codes. Ai is programmed more to create than to observe and learn patters and rhythm and timing. There is always something that it has never seen before that is like a blindspot to the ai’s codes that makes it act otherwise what the programmers intended. If ai can be taught to observe instead of just “doing what it is told,” there wouldn’t be jumbled up messes that we see in ai art.
Training robotics on virtual worlds has been tried 1000 times and it never generalized well to the real world. A better video would start by saying that, and then go on to explain what exactly was done different this time and why did it work instead of spending half the video introducing the rather straightforward idea of training an agent on a simulated environment. This channel went from being informative, technical and on point to falling off a cliff. Filler, pointless meanderings, trivia, and using 1000 words to get across every relatively simple point that could have been explained in less than 50.
Hot take: If a package can't survive the *yeet* of a delivery robot, it doesn't deserve to survive. It's time to extend darwinian logic to the machines of the world.
Programmers have difficulty with walking and running models because they don't know how it works. Running, in particular, is simply rapid a preventing of falling. The body leans forward, begins to fall over, and is stopped by the legs and feet - repeat as needed. Walking is the same, only slower. See how a toddler does this: they often start motoring forward, but can't stop, and the parent catches them. Stopping running is trickier than beginning to run, as it requires a rapid change in inertia, as the body is flung backwards to stop the forward momentum. Programmers should know this.
Why do I never hear the people developing new ai technology say "these tasks that are normally dangerous to humans are the ones we aim to automate" and "of course we did an ethics review before beginning research."
so like how it yeets the packages away like they’re to hot to touch 😂
Maybe it's really tricky to hold anything without dropping them. So you drop them before gravity does it
Perfect human package handler simulation.
the problem is rewarding package velocity. it does lead to the robot moving the package into the box as quickly as possible, but that's not exacly what you want.
@@marc_frank Seems easy to fix by bounding the maximum allowed velocity for objects and doors.
@@marc_frank They key is to make AIs not be sociopaths. Then again, nature never really got this right in billions of years.
I think if they had given negative points if the package gets over 2g acceleration at any time, it would have already handled the packages very carefully. I totally agree that the current handling seems on par with Fedex and UPS.
They'll probably do something like that to improve it later on.
Physics in an engine and physics of the outside world are not same. Not now, at least.
But yeah, something along those lines could work
@@gustavbruunkjr5123 They probably already did it but they wanted to show these less good videos first xd
I agree about using these types of robots for last-mile delivery. Big issue with delivery robots is that they get vandalised or robbed, these robots look like they can stand on their hind legs and punch anyone who tries something funny.
just a question, the pakage is on the front or on the back? i mean where it can be whitout oscuring the camera or the sensors?
Maybe we don’t train the robots how to fight?
"i DID NOT MURDER HIM!" lol
Maybe the robots train themselves how to fight
Or they can be trained to keep fifteen feet away from any human. You could still trap them. But it's a start I guess.
After seeing so many Boston Dynamics videos of robots moving in clean, precise ways, it’s somewhat refreshing to see a robot that just sort of makes it up as it goes along. At the risk of anthropomorphizing, it feels like it has much more personality.
That's the funny thing about AI and robotics. We have all these models of them as "alien intelligences", but as it has developed more and more, it seems to be converging on quasi-human like traits you wouldn't have predicted from the 20th Century pop culture perception of BLEEP BLEEP straw logic robots. It can even become addicted in its own special way.
@@ForwardSynthesis Yeah, isn't that neat? Our own intelligence evolved to learn from and respond to the environment, resulting in certain behavioral characteristics. And now that we're making intelligent systems ourselves, our own human quirks seem to be more like features and not bugs.
yes it seemed more alive. Exciting stuff
yes it seemed more alive. Exciting stuff
Yes, because it’s not scripted, it’s organic.
I like to imagine this robot just going through a building, opening as many doors as possible
It's movement is crazy smooth. I've always thought that once robots really get good, their movement won't look "robotic" at all.
I'm impressed by how lifelike those motions are. It looks just like a dog or a kid playing.
I mean it’s kind of the same thing
@@zeblanoue2642that’s a large stretch
Remember that we are also machines of a different construction. We will see a lot of qualities that we imagined were unique to life, simply because we have never beheld life emerge in unfamiliar places.
@@Redfrog1011not really though? I mean people have already been arguing where AI sentience comes into play. Most scientists agree there are some life forms without sentience. Who's to say it's not a very basic form of artificial life?
@@IiiiIiiIllIlit doesn't have enough neurons to be "sentience" per say
Consider the similarities between the robot stopping at the TV, and Yul Brenner’s gunslinger character stopping at the sight of flames in the original WESTWORLD (1973). The rationale behind both cases is eerily grounded.
Oh so the robot stops at the TV and it's grounded but then I do it and I'm lazy 🙄
Grounded in lazy leasure if you want to be very specific!@@Mega-P71
@@Mega-P71I anthropomorphise my computer devices all the time. “Lazy fucking piece of fucking shit”’
The original Westworld had a lot going for it. Those moments with that robot were amazing.
Curiosity as seeking new patterns is not what we really want, what we really want is curiosity as seeking new ways to control the environment. Now, training an agent to seek control leaves me with a bit of an uneasy feeling. But that's what it really means, what child curiosity is about: not just finding new things but actually using them. As a scientists, focused on discovery all the time, it's probably easy to forget this other vital part of curiosity.
Man, just a few months ago, I felt like the biggest lesson was that AI was going to dominate in the world of information leaving us dominating action in the physical world. I felt like we would have a fair amount of time to adapt to that new reality. Now it seems like robotics will be proliferating and able to excel in real world tasks very soon.
yup, it's crazy to see in action how synergetic these two domains actually are, it seems the a.i and robot revolution are one and the same. a few more years and we'll probably actually start seeing these as part of our daily lives.
it's still going to be quite a while
Irobot takes place in 2035.
I feel like we'll have similar robots in the next few years excluding the sentient bot
technology grows exponentially
once it clicks , theres nothing stopping it from dominating us
I mean, that’s a good thing.
I was just thinking about AI becoming addicted to something ,and you literally showed that. Maybe in future we will be able to find a method to cure human addictions through AI
couldn't you combine a dopamine suppressor with negative reinforcement to break an addiction?
in all seriousness though, addiction takes a lot of different forms suppressing the dopamine receptors would just make the person more likely to kill themselves as it would take any pleasure from their lives.
when dealing with addiction you need to separate chemical dependence from behavioral addiction, the chemical dependence is easy enough to break with controlled tapering, the behavioral addiction is a web of social and psychological issues you kinda have to work through and sometimes you just CAN'T do anything about it. you can't just tell someone to cut out everyone in their lives or to change everyone in their lives, and for some people everyone in their lives contribute to their addiction.
for some addictions, like porn, overeating, or gambling, you can make good strides by addressing impulse control. we have found areas of the brain that if inhibited can cause someone to lose impulse control, so logically if you could fortify that area a person would have better control over their addiction.
@@tuseroni6085 that is true but all has one huge flaw and that a human always has option to remove the negative reinforcement and remain addicted. So if an Ai has method to remove whatever is stoping it from getting the score for the action it is addicted to, then it will do it. What will Ai do in such a case to break the cycle?
trauma = addiction
Most addictions are about hacking the reward function, but social media/TV addiction seem to be about inherent drive of intelligent agents to self-improve by gathering information. I doubt these insights would translate to other addictions.
This seems to be the same between robots and people after all: I have noticed that even in the states of mind of complete apathy, curiosity is the only desire that does not go away.
@@sanketvaria9734"A human could just remove the negative reinforcement [curing their addiction]"
Off-topic, but I think this is short-sighted.
From personal experience in volunteering, most people with addictions really don't want to be addicted.
They're very aware that addiction takes them away from other things they care about (e.g. their relationships and dreams). But certain drugs hijack the reward centers, so their desire for the drug is just as powerful as the love for their children.
I truly believe that most people couldn't quit a desire that strong, and would lose a battle with addiction.Like Schopenhauer said, we're free to do whatever we desire, but we're not free to choose our desires.
So if there was a machine that could remove your desire for a drug, I think 99.9% of people with an addiction would use it. (And then have no need to reverse it, because they'd no longer have any desire to be on the drug).
First: create a "matrix" for the bots. Second: place them in it to learn. Third: they place us there instead. ...
Does anyone remember "Return to OZ"?? These things look just like the 'wheelers' ha ha, used to give me nightmares as a kid. I look forward to new nightmares thanks to the miracle of AI.
Are we living in an AI training sim for our real copies somewhere?
Scary thought.
yup
Congratulations you just invented religion.
Jokes on them, they just trained an AI to sit on the couch drinking coffee and watching TH-cam.
Your move simulation programmers, your move.
thats what afterlife are
Thank you for showing us these papers in a way that average people can watch and enjoy
I couldn't find these from long academic videos
the even surpassed delivery guys. What a time to be alive!
the box yeeting is very accurate to real world delivery.
I absolutely love that the robot has wheels at the tips of its limbs, I've been wondering why robotics experts keep trying to balance robots on two legs for years, wheels are way more efficient and easier to balance on (if they're not free rolling, but controllable, of course).
Because wheels aren't the best when there's bumps or elevations or any surface that's not designed to be driven on. From s biomimicry design perspective, no animal ever has evolved wheels.
If this robot were to be able to climb stairs, it would have to lock its wheels and behave like a quadrupedal robot...
@@chatboss000 so you're telling me wheels can be locked and behave just like regular legs? Wow, I'm having some trouble seeing the downside.
Locking the wheels would be super easy and is a good idea. I was just talking to a coworker about the possibility that having wheels for hands might enable it to do other things, like spin the wheel up really fast and touch it to an object to shoot it across the room instead of carrying it or having to do the two hand toss like it did. Or at one point the simulated robot jammed the door handle in the spokes so it could pull on it, combining that with wheel rotation would open up a lot of interesting interactions.@@danfg7215
I doubt a rounded shape will require less energy to be kept in place... you then need constant momentum or corrections that requires energy
@@danfg7215 Until we invent the self healing robot, those wheels are additional complexity - additional mechanical points of failure. The wheels can be locked...until there is some malfunction and maybe the wheel falls off or spins freely and your robot slips down the stairs.
“This robot has learned to deliver your package with maximum efficiency!”
Robot: *YEET!*
Holy sh*t. It’s like a real life Bumblebee !! 🐝 🤖
Im happy they managed to make him do something not so easy (specially with his form), but imagine having tire marks everywhere on your doors. I think il stick with optimus
Yes with optimus you will get your robot in twenty years when Elon admits he was a liar from a studio that looks like Mars!
That was actually a problem, while the robot first started to drive and did tank-like steering. It tore up the entire floor, so they ended up using a different gate for steering xD
@netherportalstbh the worst part of a Mr.Handy is the nuclear reactor blowing a huge flame downwards 24/7, burning everything getting close, smoking the pavement, making any room incredibly hot really quickly, noisy, and capable of seriously injure anyone getting close.
This robot just needs hands (keep the front wheels too, just let them retract on the arm) and maybe a little appendage near the back wheels that can act as a foot for better balance when it has to walk or stand still. Then it's perfect.
Two days into 2024 and we have this...
Neat.
for handling the boxes, an object could be placed inside the box for the simulation, and there could be a penalty applied to the reward for knocking over or breaking the object inside the box. Which could be as simple as writing the penalty as a function of how much the object in the box is tilted. Likewise, there could be penalties applied to excessive force applied to the box and its contents.
🎯 Key Takeaways for quick navigation:
00:00 🤖 *Introduction to an Incredible Robot*
- Introduction to a remarkable robot capable of exploring, standing up, and handling packages.
- Addressing the perceived impossibility of the robot's actions.
- Setting the stage for understanding the training challenges faced by real-world robots.
02:05 🎮 *Learning Inside a Simulation*
- Discussing the necessity of learning inside a simulation for robots.
- Highlighting the challenge of limited real-world training data for robotics.
- Introducing the concept of reinforcement learning through playing a video game.
03:47 🌐 *Translating Simulation to Real World*
- Explaining the process of translating learning from a video game simulation to the real world.
- Demonstrating the robot's ability to navigate, stand up, open doors, and handle packages.
- Emphasizing the importance of crafting reward functions for specific tasks in the real world.
05:10 🚗 *Potential Applications and Future Outlook*
- Speculating on potential applications, such as last-mile delivery and self-driving cars.
- Discussing the role of improved simulation environments in training AI agents.
- Noting the limitations of hand-engineering reward functions and the need for further advancements.
06:44 📈 *Limitations and Remedy*
- Acknowledging limitations related to hand-engineered reward functions.
- Introducing an earlier paper that addresses the generality of AI agents by teaching them to determine good scores.
- Encouraging the audience to explore experiment tracking, model evaluation, and production monitoring tools.
Made with HARPA AI
I can already hear it. Delivery Drivers: "THEY TOOK ARR JAWBS!!!"
These summaries of paper are amazing, and I've been watching for years, but I honestly have to take a break until something changes in how we train models again. The reward system is currently the greatest hindrance in my opinion, and everything I am seeing now seemed entirely possible 2-3 years ago with more time and money.
It seems tesla's approach is the closest we will get to something extremely versatile. They use a hydranet which is essentially hundreds of specialized neural net tools similar to our own brain, language LLM, transformer based image recognition, nets for logic and reasoning, nets for pattern matching, nets for motion and movement, our brain is made up of different cores that are essentially tools for our consciousness to use. I frequently see many robotic AI systems just using one type of architecture and scaling it to enormous amounts which probably isn't the right way to achieve AGI
Remember the red flashes in those build a bridge games? Add that to the score and we'll have a gentle robot in no time, I think
This is how reinforcement learning for some autonomous car systems are being created!
I’m imagining the robot’s joy as it finds new ways to make a person’s body move in different ways 😮
With a few modifications to the robot you can have a Tachikoma.
Just remember to feed one of them natural oil.
great idea. kind of that boston dynamics but with wheels. i bet it economs a lot of energy in most of a situations with that wheels 👍
An easy solution to the rough handling of the boxes is to add acceleration to the reward function. If the box accelerates or decelerates too rapidly on any axis, it would be punished for it. This would help resolve cases in which the robot moves it too quickly or when the box decelerates harshly by hitting the bin rather than being gently placed inside. This variable would need to be adjusted based on how careful you want the bot to be. If it is rewarded too much for being gentle, then it will take too much time in gently moving the box, making it inefficient in time sensitive processing. If it isn't rewarded enough, then it will prioritize speed to obtain an optimal reward and likely disregard the punishment it receives for throwing the box. I absolutely love the finesse of adjusting reward variables and seeing a model adapt to the new incentives.
Imagine how this thing delivers you a package and throws the package into your face!
First video of the year!!!! Happy 2024 everyone!!
Looks like robotics will catch up in no time. Might be the last time in history we live without them among us.
as someone living in Brazil I would be very happy having this little robot taking care of my packages, they would have bigger chances of surviving with this guy.
the way it climbs down the ridge looks very organic
Happy new year! What a time to be alive
This would make a great Rover on planets. Amazing. Always trying to make people obsolete at the same time.
lol @ delivery service joke 😆
Now I want a robot cat with wheel feet!
Imagine the terrain it could navigate if it were as limber as a cat AND can propel itself with wheels too!
They also treat the packages just like the average delivery driver. Impressive level of realism
‘Just a good as a delivery guy’ 🤣❤
Insanely smooth.
WOW.
And to think that we are seeing this on the very first day of 2024.
imagine a self driving car that come upon a strange irl driving situation and quickly captures the relevant data and runs millions of simulated next steps to train itself on the best next option all in an instant and then proceeds using the new optimized route
Wow, the teslabot is looking great!
Oh, wait a second...
Yup. There's a lot of competition.
Its hilarious how much they doubt their abilities only to exceed them without any human interaction.
Much smarter and more capable than anyone assumes at this time.
Super cool
I love this channel. Hear bits I might not catch in the papers, and get a good laugh. "What a time to be alive!" hehehehehe ... Thanks and Cheers ...
Kinda scary. You could easily teach a robot to kill this same way. 😮
I’ve had ideas about transferring the results of reinforcement learning to the real world to be emulated as a fast form of teaching robots how to behave, but there’s currently a problem.
If you see the motions the reinforcement learning puts out, it’s super spastic due to the randomness introduced. I think it would be a good idea to also make an animation of what you want the thing to do as a baseline, and then award points based on how much it follows the animation balanced with completing the task.
In this way you can get the kind of human like or whatever motion you would like while still completing the task.
2:42 - "Stephanie, input! Need input!"
5:24 the way it stands up would be jarring in person.
hm, curiosity is something that is a bit scary for an AI to develop... but also very cool. The robot design also seems quite functional! Very cool.
All the developments that are going on are scary. Fascinating, inevitable, but definitely scary as well.
"why did you murder that man?" "he got cut and red stuff came out, i wanted to see where it came from, so i ripped him open and i'll tell you, there was A LOT more where that came from. never did figure out where exactly it came from...seemed to just be everywhere"
@arandomgamer3088 morals are a tricky thing to give to a psychopath. it's not impossible but the normal means (through empathy) don't work. and AI are inherently psychopathic. they can fake empathy to accomplish their goals, but they don't actually HAVE it, they can trick you, they can say nice things, but they don't actually FEEL anything.
but, as i said, you CAN teach a psychopath to be an upstanding citizen, to have a kind of morality, but it's harder and you kinda need someone specialized in that kind of teaching.
@@tuseroni6085This! This is very true my friend.
@@tuseroni6085 where is that from?
I really wounder if those models that train running and movements take in to account muscle metabolism. Like if they had to take into account energy in the muscles and all that weird jumping... I suspect that behavior would disappear pretty fast.
Literally thinking the same thing.
@@Slowly_Going_Mad right. I mean how can they let those little computery muscles get away with those shenanigans.
Makes you wonder if we do this in our dreams 😊
They're literally training robots to deliver packages the same way bad delivery drivers do. 😂
If you were to take the graph of acceleration of the objects the robot is handling, a higher amplitude of the peaks of the graph would be a pretty good signal that the robot isn't being gentle, so the amplitude of the peaks could be used to subtract points. That'd encourage speed while at the same time encouraging being gentle
These are pretty neat, their design kinda reminds me of a Go-Bot (in a good way)
It's about time someone combined the benefits of limbs and wheels, basically having 4 arms should also make it uniquely agile.
Now this is impressive!! im not sure Boston dynamics and that new Tesla Robot cant have this level of dexterity.
trowing packages, like the real one, love it
Amazon (and UPS/FedEx) drivers have waaaay less than a decade left on their job clocks.
HI THIS IS AMAZING
learning in a simulation is what our brains do . thats what dreams are according to neuroscientists .
It treats packages better than most couriers
I'm not a big believer in reward shaping. It makes for funny papers with interesting results. But it is very finnicky, very prone to reward hacking and very much lack generality.
Richard Sutton said something like: the reward function should only encode the goal, not the way to achieve it.
And I agree. We should work on other ways to drive the behavior during training.
It seems like we're getting better at understanding the ABCs of human motivations which underlie our neurology: Make the thing move. Make it spin. Things that change are more interesting than those that stay the same.
So if the robot AI thinks the mechanics of what it's doing is interesting, not just the goal, then it will explore different mechanics "for fun" and as a result it will become better than if it was just focused on the goal. Just like humans!
Most careful delivery driver
I mean if I were in a simulated box and spammed with TVs on all sides of the wall I'd give up too 😁
The way it handles those packages! I think it's ready for UPS, lmao.
Great video! Keep up the good work.
simply amazing! 🔥
I was just telling my dad about my idea to do this yesterday. Of course it already exists, we still need better simulations and more computing power to do it adequately though.
I believe simulation training for robots would get a LOT more effective once someone develops a good simulated environment that can account for material physics and the damage they take. Simulated cardboard, drywall, stone, metal, wood, could all return damage numbers to the model with each engagement with the simulated robot. If we could have a sophisticated simulated environment which can account for damage that the agent might do to it, or to itself based on the materials they are made of, then the reward function could be highly efficient. This would result in robots handling delicate materials with care and being a bit more aggressive with things like concrete walls and metal doors. Simply train your model in a simulated environment based on the environment it will be deployed in and the materials it will be working with.
Currently the solution for protecting delicate materials from robots is to cover any contact points with soft materials and adding tensioners to the joints. I submit that a robot trained with proper material physics could catch a falling egg with titanium hands.
i am curious of how large the memory would be needed to finally create a professional sniper drone...what a time to be alive
The concept of ‘reward’ is an interesting one. Clearly for Humans there is vast swathe of these reward motivators, I suppose it’s what makes us both individual and (when reward is pooled) able to act in teams to achieve goals and objectives.
Reward is only one part, there is a 'carrot and a stick' - the other part is punishment.
Our reward functions have grown in the span of billions of years, let's see if they can do better in the span of 10. Soon AI will be able to suggest the best reward functions used to train other AI. The problem is that if humans don't understand these functions, you could have skynet way too easily
what strikes me is that a voxel based game engine would work seamlessly in simulation and allow for pulling real world voxel data from point clouds, such that the ML algorithm can replay events to learn offline on real world problems.
I'm not excited about how smart our future masters are getting.
Okay so now we train cars on the roads. Then nobody drives and we all stay safe because the ai is trained better than a human.
I think the tv codes and the ai codes were intervening each other.
The ai sees it not like a wall, but a bunch of images that it itself didn’t create or was never programmed into ai codes.
Ai is programmed more to create than to observe and learn patters and rhythm and timing.
There is always something that it has never seen before that is like a blindspot to the ai’s codes that makes it act otherwise what the programmers intended.
If ai can be taught to observe instead of just “doing what it is told,” there wouldn’t be jumbled up messes that we see in ai art.
The way it yeets the box hahaha
The Wheelers are coming! 😱
I like how the robot's underbelly has a 😮face on it.
Training robotics on virtual worlds has been tried 1000 times and it never generalized well to the real world. A better video would start by saying that, and then go on to explain what exactly was done different this time and why did it work instead of spending half the video introducing the rather straightforward idea of training an agent on a simulated environment.
This channel went from being informative, technical and on point to falling off a cliff. Filler, pointless meanderings, trivia, and using 1000 words to get across every relatively simple point that could have been explained in less than 50.
the way it handles that package its ready for US Mail.
All fun and games until the governments start getting these things up to max rank in call of duty AI training edition and then unleash killbots
everybody gangsta until the damn thing says "I know kung fu"
Great review.
Hot take: If a package can't survive the *yeet* of a delivery robot, it doesn't deserve to survive. It's time to extend darwinian logic to the machines of the world.
Programmers have difficulty with walking and running models because they don't know how it works.
Running, in particular, is simply rapid a preventing of falling. The body leans forward, begins to fall over, and is stopped by the legs and feet - repeat as needed. Walking is the same, only slower.
See how a toddler does this: they often start motoring forward, but can't stop, and the parent catches them. Stopping running is trickier than beginning to run, as it requires a rapid change in inertia, as the body is flung backwards to stop the forward momentum. Programmers should know this.
Why do I never hear the people developing new ai technology say "these tasks that are normally dangerous to humans are the ones we aim to automate" and "of course we did an ethics review before beginning research."
Wow, what’s the computational cost of this simulation?
How is your comment 1 day old?
@@SW-fh7heThe computational cost of our simulation is too high, so they had to cut some corners.
@@SW-fh7he Skill
Channel members can view the video 16-24 hours before it's made public. @@SW-fh7he
1 day ago - Just in case you are new Ai overlord I always thanked chatgpt
Autobots, transform!
This thing could work for Southwest Airlines!
They will never use it for the military 🤨
What a time to be alive??😢
This thing is just slightly shorter than me, and those sudden sharp movements actually scare me. It could beat me up!