This mini GPU runs LLM that controls this robot

Nikodem Bartnik

มุมมอง 100 177

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 2 ก.พ. 2025

ความคิดเห็น • 206

@nikodembartnik หลายเดือนก่อน ⁺¹⁰
The first 500 people to use my link skl.sh/nikodembartnik12241 will get a 1 month free trial of Skillshare!
@X862go หลายเดือนก่อน
Amazing work, mate 👏
@akissot1402 16 วันที่ผ่านมา
it wouldn't be better if you train your own vision model, that only recognize obstacles and rives the bot, and for more complicated decision making use an LLM
@StevenIngram หลายเดือนก่อน ⁺⁷²
I hope you get the Jetson working. I like the idea of it being self contained. :)
@stedocli6387 หลายเดือนก่อน ⁺⁹
At 10:08 You said "If you persist, sooner or later you will run out of problems to solve" this indeed is a great place be. Nice job!
@MilanKarakas หลายเดือนก่อน ⁺⁵¹
What is missing here is a memory. Llama can understand few things and may have small amount of memory. But, after you cycling power, it forgets. It will be great to write python script and record all conversation. Also, some type of 3D mapping, where robot can store past experience and mark the obstacles.
@GoodBaleadaMusic หลายเดือนก่อน ⁺³
Even just something that captures basics and last few moments context.
@rodrigogomes6086 หลายเดือนก่อน ⁺⁵
Also I think it would better if he gave data from some sensors, like the distance between the the robot and the obstacles ahead
@ezradlionel711 หลายเดือนก่อน ⁺²
Bro solved sentience with a sentence
@peacekeepermoe 24 วันที่ผ่านมา
@@rodrigogomes6086 Yes. Also a way to map the room, like robot vacuum cleaners do. It will help with navigation around the apartment instead of relying on the camera. The vacuum robots don't need camera or lights, they can work in the dark/night.
@Larimuss 22 วันที่ผ่านมา ⁺¹
You can do this with a database connected to ollama and store in the vector database and use ollama RAG or any RAG really.
@engineeringstudent2700 7 วันที่ผ่านมา
i love how realistic this was! you covered all issues you encountered and didn't sugarcoat anything! WAY TO GO MAN
@SuperFetaCheese 17 วันที่ผ่านมา ⁺⁴
It cracks me up how it constantly accuses your place of being cluttered for no reason.
@domramsey หลายเดือนก่อน ⁺³⁷
I think the biggest issue here is your overall approach and your prompt. You have a distance sensor that gives a precise result in cm, yet the quantities you're using are arbitrary "low", "medium". If in your prompt, you tell the LLM the nearest object in front is (say) 85cm away, the nearest to the left is 10cm, and to the right is 200cm away, then ask it to output an angle to turn and a distance forward to travel. So it will come back with "Angle: 20, Forward: 50" or similar, which should be easy for the robot code to process. Make every move an angle followed by a distance, but use actual measurements. Your prompt could probably also do more to get the LLM to guess at the distance the objects it sees are likely to be from it.
Oh, and get more distance sensors and mount them at 45 degrees left & right. I really feel like these should be the primary input for guiding movement.
Yes, it's entirely possible that won't work at all. :)
@PB1XYZ 15 วันที่ผ่านมา
Coś wspaniałego Nikodemie !!! Gratuluję !!!
@Math2C หลายเดือนก่อน ⁺⁷
Here's what you should do next. Use a computer Vision software to identify only the names of the items in the room. Let it draw bounding boxes around the identified items. Combine the area of the bounding box with the distance from your distance sensor to determine its size. Im not sure you did this already but your robot needs to know its actual location. Use the LLM to distinguish between objects that are permanently placed and those that are laid out. Record the various direction that the rover has looked already. So for each object the rover should know its size, relative direction and the distance it is away from it. Provide that information to the LLM finally to determine which direction it should move. Or if it should rotate.
@OriNachum หลายเดือนก่อน
With Hailo-8L/Hailo-8 you can do that on Raspberry Pi at surprising processing power
@M13RIX หลายเดือนก่อน ⁺¹
Man, your videos are so inspiring! They significantly help me not to give up on my own ai projects. I would love to see better improvements in this one, for example complete rejection of paid serviceces in exchange for local, but still high quality ones (for tts you can use coqui xtts - runs localy, has a realtime version + you can clone any voice)
@v1ncend หลายเดือนก่อน
When I saw your first robot, it brought back memories.
@DearNoobs หลายเดือนก่อน ⁺¹
i love this project, wish i wasnt so far behind on all my other projects becasue i want one of these too!! hahah GJ bud
@GhostMoney0007 19 วันที่ผ่านมา
great idea same thing I was thinking I would start learned a lot from you
@geedub-b3i 14 วันที่ผ่านมา
You are a wonderful resource and inspiration, thank you!
@ginogarcia8730 29 วันที่ผ่านมา
oh my goshhh finallyyyyy, been trying to find something like this
@RemoteAccessGG หลายเดือนก่อน ⁺²⁴
I think you should make the robot remember previous image outputs (if you haven’t already), so it will have some logic. And also add a lidar sensor if you find that camera hard to setup. Giving the information to the LLM will be tough, because it cant understand what are a bunch of random numbers given to it.
@SLRNT หลายเดือนก่อน ⁺⁴
i think the llm could understand it if given the "format" of the lidar data. e.g an array of 1,2,3 and telling the llm first number(s) mean distance to left, 2 for distance to front and 3 meaning distance to right. ofc the array would be longer and you could average the numbers or just separate the directions with code
@Davidfirefly หลายเดือนก่อน
that would require some machine learning implementation
@soeasy22 หลายเดือนก่อน ⁺⁵
Imagine a robot equipped with 16 NVIDIA H100 GPUs, running the 405B parameter LLaMA model, packed with sensors, exploring the world.
@GlobalScienceNetwork หลายเดือนก่อน ⁺¹
Yeah, this is a great thought. Easy to create and super powerful. You just need to give it a platform so it can interact with the world as well. We will see many products using this coming out soon. It could be similar to the Tesla Optimus humanoid robot with very little development.
@mako1-1- 8 วันที่ผ่านมา ⁺¹
It'll probably get destroyed, and parts will get stolen. It's a good idea though
@GlobalScienceNetwork 8 วันที่ผ่านมา ⁺¹
@@mako1-1- That is another good point!
@CandaceSimpson-z8w 3 วันที่ผ่านมา
Your robot sees what the video shows it. A 2D world. At certain angles this makes your room look full depending on what % of floor it is seeing.
@CandaceSimpson-z8w 3 วันที่ผ่านมา
Try pointing camera a little more towards ground to help it's perception.
@catalinalupak หลายเดือนก่อน ⁺¹
Great progress on your project. I like your attitude and thinking. You should also try n8n for more local logic on that Jetson Orin Nano. Also will help you to build a map of the environment and have it stored locally, this will speed up also the decision making. Looking forward on your next steps
@watchingpaintdry1411 14 วันที่ผ่านมา ⁺¹
Consider adding a HUD to camera input, showing robot position, orientation, maybe even a minimap.
หลายเดือนก่อน ⁺²
For very simple things, a microcontroller could be nice to learn programming, but IMO i think something like raspberry pi (computer with gpio) is much more useful for robotics to start with. Imagine you are creating robot, and you want to change code, see what the code is doing, see camera etc... you find bug in code, you just ssh over wifi, change code with nano, run code, see what it does, etc. Now i agine doing it with microcontroller. Any bug, you need to get to the robot, turn it off, plug usb cable, program it (in case for arduino wait for compile...), unplug, power.... it gets tiring pretty quick.
@yt742k หลายเดือนก่อน ⁺¹
we are now in age of "talking animals" this prediction is so spot on said
@AngweenAnnora 17 วันที่ผ่านมา ⁺¹
i already imagining a super intelligence roomba with their greatest mission, to clean the world.
@jathomas0910 หลายเดือนก่อน ⁺⁸
I’m watching this high as hell, when she said “let’s head over there to the wall where humans hang out” I nearly died laughing omfg 🤣😂😂🤣😂🤣😂🤣😂💀🙏🏾😇 12:00
@legendaryboyyash หลายเดือนก่อน ⁺⁷
damn I was actually learning robotics just to make this exact same robot controlled with ollama after watching your chatgpt one, I was planning on using raspberry pi 5 for everything, guess u beat me to it and made it even better lol
@nikodembartnik หลายเดือนก่อน ⁺²
Thank you! Keep working on your project and make it better than I did!
@legendaryboyyash หลายเดือนก่อน ⁺²
thanks :D I'll try my best to meet your expectations :D
@@nikodembartnik
@erutku 18 วันที่ผ่านมา
alas a great experiment(s). well done.
@andyburnett4918 หลายเดือนก่อน
I love watching your videos. They are very inspiring.
@HamzaKiShorts หลายเดือนก่อน ⁺¹
I love your videos!
@alikorn2035 13 วันที่ผ่านมา
Dude, your videos are inspiring! I think you could improve your robot by adding a rangefinder of some kind. Also, I would change the control type to a coordinate system, so there would be more options for maneuvering.
@Pawlixus 24 วันที่ผ่านมา
im proud that we have our young skywalker in poland
@Confuzer หลายเดือนก่อน
I build a chatGpt esp32 bot in only a few weeks, and it's fun to load new prompts on it. Wish I had the same memory as the chat interface, that one is pretty good. Now I send a summary with every api request, also increasing token count. Also I don't think a local LLM will be upto par, but the frequency can be real time and unlimited. But that will be the 3rd project, my second bot is with a pi zero.
@loskubalos หลายเดือนก่อน ⁺²
No wydaje się ciekawie musze obejrzeć kiedy będę miał wolną chwilę
@jakub38200 หลายเดือนก่อน ⁺¹
tu jest więcej polaków niż myślisz
@loskubalos หลายเดือนก่อน
@jakub38200 wiem bo Nikodem jest z Polski ja oglądam jego dwa kanały
@kodomyataren6634 29 วันที่ผ่านมา
excellente, thanks a lot it's a perfect example. for distance you can try ToF RPi Camera, I didn't tester yet
@luc8350 15 วันที่ผ่านมา
hey dude. great video but a small recommendation... in some parts of it the audio/music is pretty loud compared to your audio. this would be just as great with no music.
@anonym_user_nksnskdnkdnksndkn หลายเดือนก่อน ⁺¹⁰
Do you think you could make a drone, controlled by GPT? would be sick XD.
@manuel_elor หลายเดือนก่อน
Up
@sergemarlon หลายเดือนก่อน ⁺²
Seems possible. Drones can hover in space, acting like these robots without moving their wheels. The issue I see is that you would need a large drone in order to handle the payload of the electronics. It may be possible to stream the video from a drone to a stationary PC which then computes and sends the radio signals to the drone.
@pimf-youtube 21 วันที่ผ่านมา
Great video! Have you thought about running one model only for navigation and a second one for talking? Maybe it will be smoother. Or add something like Lidar for navigation (like some vacuums). No idea how this stuff works but maybe you do. Definitely subscribing in hope for the next iteration.
@adolfoquevedo7429 27 วันที่ผ่านมา
thanks, great video!
@MyPhone-qg2eh หลายเดือนก่อน ⁺⁴⁹
But your text to speech isn't local.
@colinmcintyre1769 หลายเดือนก่อน ⁺²
You don't want it to be.
@slapcitykustomz1658 หลายเดือนก่อน
@@colinmcintyre1769 Why not? Nvidia has local (llamaspeak) text to speech and (Open Ai Whisper) For speech recognition both library both can be ran locally on the jetson
@ChigosGames หลายเดือนก่อน ⁺⁴
@@colinmcintyre1769 why not? If the computer can create beautiful voices as well then everything could be locally local.
@colinmcintyre1769 หลายเดือนก่อน ⁺³
@ChigosGames you want to utilize as much compute as you can for the best results, I'd asume. By trying to do everything locally, it's instantly much more expensive and less practical.
@ChigosGames หลายเดือนก่อน ⁺⁶
@colinmcintyre1769 I fully understand you. But to outsource everything to paid API's, real life products will be unaffordable. Imagine making a product that only consumes API's, you could only sell it for a hefty price with steep subscription to it.
@sethunthunder หลายเดือนก่อน
here before "1 hour ago", creative project bro, keep the work up!
@mr.makavelifashion566 11 วันที่ผ่านมา
since i am working on a self balancing robot i think i got some knowledge in this , you robot takes more time and sometimes it can take forever just to find something that was totaly behind it i suggest that u add certain commands for example before doing anything it does a 360 in it's own place to analyse around to know where it's heading and notice if the item it is looking for is around
@Gabokor-76 21 วันที่ผ่านมา
Bro is creating the next terminator
@64jcl หลายเดือนก่อน
Using Llava model image descriptions alone is not really enough for navigation although it is an interesting experiment. A thing you should try is to make your robot scan the environment by rotating it 90 degrees, take a picture, analysis, and repeat that. When you have 4 descriptions you can make a judgement about where to go based on whatever the goal is. Ofc this is somewhat slow though. Also you could run the image through a depth analysis model. That spits out a gradient image based on depth estimation and those are very good at knowing where there best path might be taken, although you'd have to calculate approximate rotation based on what area of the image you decided the robot should navigate to (either towards closest object or where there are no objects).
@markusstaden หลายเดือนก่อน
I like the idea of using the vision capability of llms. You could try to preprocess the image, for example using depth estimation or using a lidar and putting the data on an image. Smaller Models might have problems understanding the data, but that could be solved using fine tuning I guess.
@NakedSageAstrology วันที่ผ่านมา
I would love to see this with DeepSeek R1.
@thomasschon 29 วันที่ผ่านมา
The robot started testing your patience, so you decided to end the experiment before it was too late.
Dude, you seriously need to attend some anger management classes. 😅
@jtreg หลายเดือนก่อน
so good! Messing about with my Tesla K80 today... a bit limited what it can run but llava is ok np
@GlobalScienceNetwork หลายเดือนก่อน
Cool video. Bluetooth latency: ~100-200ms. Just a heads up that this could be one of the issues for real-time obstacle avoidance. WiFi latency: 15-30ms Analog RF systems: ~5-10ms. These systems should be all on board or analog if sending to an extra source for computing. The LLM computing will add further delay but should be quick if you use a trained network. However, if you want to train based on your environment from its sensors perspective I would think you would want to do some training and have a custom network. I am not sure how difficult that would be to achieve. Personally, I am going to try a more basic approach and stay analog for everything and not use an LLM. So it might take me more than 10 minutes to program.
@g0ldER หลายเดือนก่อน
I have a bit called Cozmo, who has some basic internal 3d modeling, it uses this for object permanence! You should try this, it only uses one really bad camera.
@MrMcBauer หลายเดือนก่อน
You should try two LLM´s on one robot. One for decision making and one for the controls.
@AmoZ-u7b หลายเดือนก่อน
Hy man I love these series
@OmPrakash-ai หลายเดือนก่อน ⁺¹
I feel like adding more sensors, like LiDAR, could help the LLM make better decisions. Also, what if all the data from those sensors and cameras were used not just for reacting, but for planning ahead and executing smoothly? It might make the robot feel less… stuttery, you know?
@CheerApp หลายเดือนก่อน
Great project, just a thought...maybe initially the robot should detect types of objects and based of type "book" found then attempt to read the cover title?
@truthtoad หลายเดือนก่อน
maybe adding a flir cam can assist it's navigation, great work. I want a nano😝
@lakshaynz 28 วันที่ผ่านมา
👏👏👏👏 excellent, inspiring!
@LukaNegoita หลายเดือนก่อน
Have you considered adding additional sensors that get fed to the LLM? For example, a distance sensor and the prompt that goes to the LLM is the image description plus something like “and the distance is xx centimeters in front of you”
@MX-Vette 26 วันที่ผ่านมา
You need a bumper sensor all around the robot so a signal can come back to saybifbits collided with sonething. Pair that with an enhanced memory, and then it will remember that every time, seeing something at a certain distance will result in a collision and attempt to navigate around it.
@sKrible144 28 วันที่ผ่านมา ⁺¹
you should give the robot a glados voice from portal
@MrBooks36 หลายเดือนก่อน
you could make the the robot have a sense of surrounding by adding distance senses and feeding the info to the llm
@angelbar หลายเดือนก่อน
You need to generate volumetric data from stereoscopic images and create a semi-permanent map with constatn updates
@stumcconnel หลายเดือนก่อน
This is so damn cool, what a huge step up to have everything local!
From that part at around 13:50 where you omitted all the extra output processing and just let it run around, operating immediately on each result, it looks like it never really paused in its movement, but was processing an image roughly every second? Maybe the images were all just blurry because it was moving and couldn't be processed well? Or did it pause briefly to get each shot?
Sorry if you'd already accounted for that, maybe the camera frame rate or whatever is plenty fast enough!
@FelipePereira010 28 วันที่ผ่านมา
😃🤝🫂TEM dublagem para o português, obrigado meu amigo ❤, ganhou mais um inscrito 👍, excelente trabalho!!! 🎉
@bananabuilder2248 หลายเดือนก่อน ⁺²
Just a suggestion, what if you added LIDAR to improve obstacle avoidance!
@nikobellic570 หลายเดือนก่อน
This is the coolest thing
@X862go หลายเดือนก่อน
Awesome 👌
@MrJohnboyofsj หลายเดือนก่อน
Build a Minecraft villager, give it the same pathfinding AI as in Minecraft, you'd just need a way of 3d scanning the room into voxels.
@OZtwo หลายเดือนก่อน
Very very cool! I been waiting for someone to try this! I stopped playing with my robots when LLMs came out knowing it would be better than DL -- was I right? Please prove me right! :) Also mixing both Pi and Jetson you get much better overall servo control as the Jetson really has no power to support them. Very cool! Hint: I hope you use two LLMs that can talk to each other as our brain works... (edit: are you using the LLMs API or directly chatting to it?)
@simeonnnnn หลายเดือนก่อน
Zima Blue 💙
@MSworldvlog-mr4rs หลายเดือนก่อน
You should use ultrasonic sensor in all directions, and you can send a depth map with img to this llm
@_taki.debil_ หลายเดือนก่อน
You can also use piper tts, it runs locally and you can train custom voice for it.
@rhadiem หลายเดือนก่อน
Also xtts v2, F5, more
@redthunder6183 หลายเดือนก่อน ⁺⁴
if your using ollama, you should try setting the context window to be bigger. the default is 2,048 which fills up very fast after 2-3 api-calls especially with images.
If ur using a GPU like the 4060ti, you can bumb that up to at least 16,000 easily while still having the same performance. I have 12GB vram, and I am able to run llama3.1 8b with 28k context size for a comparison. This should help significantly for things that require more than 3 steps, you can also keep track of the prompt size as it builds up to know when it overflows and starts to truncate it and forget stuff.
Also as for navigation, the LLM has no context of its position, where it is reletive to the world, etc. you would need to design a system to give it enough information to be able to gauge its relative position to make informed decisions. for example, if you are able to get the relative position of 3 random points, it should be possible to triangulate you exact relative position, and you could overlay those 3 points as 3 different colored dots on the image. this is a bad example cuz your asking the LLM to triangulate its own position, but it shows the idea of modifying the image to put it into context more.
@simongilbert7895 4 วันที่ผ่านมา
Maybe produce a map with estimate of preceding position that is fed to the vision model and updated each time also? I like the idea of a kind of heads up overlay being passed to the model as you've described with dots or even data like distances to help it.
@KJLT20 29 วันที่ผ่านมา ⁺¹
It needs memory and you should add lidar too
@imdaboythatwheheryeah หลายเดือนก่อน ⁺⁵
Says he wants a local system, 5mins later uses a payed TTS service because he gets money from them. Your goals are flimsy and disappointing
@rhadiem หลายเดือนก่อน
👌
@imranmaj8159 26 วันที่ผ่านมา
thanks great videos
@alexany4619 18 วันที่ผ่านมา
Great, but I couldn't see in the video how you integrated the GPU on the robot?
@erniea5843 หลายเดือนก่อน
Working with Jetson nano is such a pain in the A
@nikodembartnik หลายเดือนก่อน
Why? So far it seems like any other raspberry pi/Linux based sbc
@unknownumar243 24 วันที่ผ่านมา ⁺¹
U can use hugging face text to voice model for converting text to voice
@haithem8906 17 วันที่ผ่านมา
Try the Kokoro model. It sounds really good and is local
@MaxSvcks 28 วันที่ผ่านมา ⁺¹
Great Project. I'm also trying to build a Robot with a LLM built in. Maybe I want to try to run it completely locally with a raspi and a NPU HAT. If it doesnt work I will self host Ollama on my Server :)
@pitong1989 หลายเดือนก่อน
have you thought about giving the robot proximity sensors or lidar to allow it to see the distance to given objects? It could integrate vision with distance detection to enable faster movement and automating it to some extent
@stony42069 หลายเดือนก่อน
Seems turning speed is faster than computational speed
@christiansrensen3810 หลายเดือนก่อน ⁺¹
I like your vids great job.
But before filming you could have cleaned up a bit. ?
@ChristiaanNdoro 17 วันที่ผ่านมา
is it possible to have a separate LLM that will be evaluating how well the robot is doing and prompting it to try again
@engineeringstudent2700 7 วันที่ผ่านมา
INSPIRED
@Honzo64 29 วันที่ผ่านมา
Regarding the sound output: why don't you just connect a bluetooth speaker? In case the Jetson doesn't have bluetooth you can add a dongle.
@5mxg 28 วันที่ผ่านมา
Ja czekam na odkurzacz, który będzie miał łapkę którą sobie podniesie kable leżące tam, gdzie chce posprzątać.
@conorstewart2214 19 วันที่ผ่านมา
The new jetson orin nano is not really a new model at all, it is just an extra 25 W power mode. You can just update an old jetson orin nano to get this power mode too.
@michaelpaine9184 24 วันที่ผ่านมา
Multi-agent system would help separate the driver (pathfinding) and the decision maker.
@power_death_drag หลายเดือนก่อน
you should add a lidar detector to measure distance seems like it cant tell 5 meters to 10 cms
@RenardThatch หลายเดือนก่อน
So you need to figure out how to tell it how much distance it would cover with any number of predefined scripts you use to execute movement or get super sweaty an give it all the specs of the hardware its running on so it can determine how far or fast to go with the pi measuring power to each motor... Then give it a sense of scale with a secondary model designed to run on stereoscopic cameras... Use that model to give the dimensions back to the decision making model and then figure out how to run that jetson on it to keep it all self contained and then... I think you should call Mark Rober because youre basically designing a curiosity rover for consumers at that point.
@ryzowskyyryzi 25 วันที่ผ่านมา
Stary ja mam dla ciebie mocny respekt
@miltontavaresinfo หลายเดือนก่อน
Very nice 👏🏽 New subscribe!👍🏽
@MarxTech_DIY 2 วันที่ผ่านมา
What if you have a Raspberry Pi or an Arduino driving the motors and have the Jetson send the commands through serial to the RPi or Arduino?
@takeraparterer หลายเดือนก่อน ⁺¹
"offline" and then it uses paid closed source APIs
@newBee-j1q 5 วันที่ผ่านมา
This is so great and inspiring! What I want to know: how does the LLM control the robot? Can anyone tell me?
@Hojitashima 29 วันที่ผ่านมา
what about a 360 camara, where you give it the possibility to see the whole room?
@beastmastern159 หลายเดือนก่อน
u can use rasberry AI hat whit hailo 8 module to run llama 3.2 module, rasberry 5 have great grafic computing capaciti but the AI module is great for runing IA models, i follow u, i like ur content keep going
@josh-barth หลายเดือนก่อน
Explain the prompts and the data exchanged. How did you form context? You say you're changing the prompt and the task but don't tell us what actually changed. How does the machine go? I get the multimodal RAG aspect, but how does the LLM know to respond with an intent to move? Then how is that given to the Pi? What's that datagram look like?
@5fsrinikshith436 หลายเดือนก่อน ⁺²
under 15 mis batch here!!
@drj2220 หลายเดือนก่อน
great!
@sandinopaulguerroncruzatty4440 หลายเดือนก่อน
Would be interesting a little dron with AI

ต่อไป

เล่นอัตโนมัติ

Perimeter wire is better than GPS RTK. (IndyMower #2)