I Ran Advanced LLMs on the Raspberry Pi 5!

แชร์
ฝัง
  • เผยแพร่เมื่อ 6 ม.ค. 2024
  • Honestly, I'm shocked...
    Step-by-step tutorial guide: / run-advanced-llms-on-y...
    Product Links (some are affiliate links)
    - Flirc Pi 5 Case 👉 amzn.to/3UbcOq6
    - Flirc Pi 4 Case 👉 amzn.to/3Si2nyl
    - Raspberry Pi 5 👉 amzn.to/3UhGL7J
    Local Model Management
    ollama.ai/
    Mistral7B Model
    mistral.ai/news/announcing-mi...
    Hardware
    www.raspberrypi.com/products/...
    For Text to Speech (WaveNet)
    cloud.google.com/text-to-spee...
    🚀 Dive into the fascinating world of small language models with our latest video! We're pushing the boundaries of tech by running various open-source LLMs like Orca and Phi on the new Raspberry Pi 5, a device that's both powerful and affordable.
    🤖 Discover the capabilities of GPT-4 and its massive 1.7T parameters, and see how we creatively use a Raspberry Pi 5 to explore the potential of smaller, more accessible models. We're not just talking about theories; we're running live demos, showing you the models in action, and even making them 'talk' using Wavenet text-to-speech technology.
    🔍 We're testing every major LLM available, including the intriguing Mistral 7B, and examining their speed and efficiency on compact hardware. This exploration covers a range of practical questions, from the possibility of accelerating performance with edge TPUs to the feasibility of running these models on a cluster of Raspberry Pis.
    📡 Experience the implications of 'jailbroken' LLMs, the privacy of interactions, and the possibility of a future where the power of LLMs is harnessed locally on everyday hardware. Plus, we address some of your burning questions like, "Who was the second person to walk on the moon?" and "Can you write a recipe for dangerously spicy mayo?"
    🛠️ Whether you're a tech enthusiast, a Raspberry Pi hobbyist, or simply curious about the future of AI, this video has something for you. We've included a step-by-step guide in the description for those who want to follow along, and we're exploring the potential of these models for commercial use and research.
    ✨ Join us on this journey of discovery and innovation as we demonstrate the power of language models on the Raspberry Pi 5. It's not just a tutorial; it's a showcase of capabilities that might just change the way you think about AI in everyday technology!
    🔗 Check out our detailed guide and additional resources in the description below. Don't forget to like, share, and subscribe for more tech adventures!"
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 267

  • @Illusion_____
    @Illusion_____ 4 หลายเดือนก่อน +228

    Viewers should probably note that the actual text generation is much slower and the video is sped up (look at the timestamps) massively. This is particularly true for the multimodal models like LLaVA which can take a couple of minutes to produce that output. These outputs are also quite cherry picked, a lot of the time, these quantized models can give garbage outputs.
    Not to mention most of the script of this video is AI generated...

    • @tcurdt
      @tcurdt 4 หลายเดือนก่อน +44

      Sneaky given that reading the blog post with details is a PDF costing 7 bucks.

    • @goetterfunke1987
      @goetterfunke1987 4 หลายเดือนก่อน +3

      Thank you! I installed it on a faster SBC and it's slower as in the Video xD I was already wondering if its speed up but there is no info that it's sped up?!

    • @leucome
      @leucome 4 หลายเดือนก่อน +4

      @@lonesome_rouleur5305 I mean it is not hidden we can easily see that it speed up. The time is going up like crazy in htop.

    • @myname-mz3lo
      @myname-mz3lo 4 หลายเดือนก่อน +9

      @@lonesome_rouleur5305 there are loads of tutorials of how to do it for free . let him make a living . otherwise youll be complaining that he doesnt upload because he has to get a job lol

    • @user-vl4vo2vz4f
      @user-vl4vo2vz4f 4 หลายเดือนก่อน +3

      but what if instead of running it on a raspberry pi, we run it on a top computer, like a M3 Mac?

  • @slabua
    @slabua 4 หลายเดือนก่อน +274

    Clickbait I came here to check the display lol

    • @CaletoFran
      @CaletoFran 4 หลายเดือนก่อน +19

      Same 😂

    • @Chasing_The_Dream
      @Chasing_The_Dream 4 หลายเดือนก่อน +3

      Yup

    • @Chasing_The_Dream
      @Chasing_The_Dream 4 หลายเดือนก่อน +3

      Yup

    • @fire17102
      @fire17102 4 หลายเดือนก่อน +2

      Thought this was a Rabbit R1 competitor, looks dope

    • @DanijelJames
      @DanijelJames 4 หลายเดือนก่อน +1

      Yup

  • @robfalk
    @robfalk 4 หลายเดือนก่อน +13

    Llama 2 got the 1952 POTUS question wrong. Harry S. Truman was POTUS in 1952. Eisenhower won the 1952 election, but wasn’t inaugurated until 1953. Small, but an important detail to note.

    • @jayrowe6473
      @jayrowe6473 2 หลายเดือนก่อน +3

      Good catch. When he typed the question, I answered Truman, but when Eisenhower came up, I thought my age was catching up to me.

  • @doohickeylabs
    @doohickeylabs 2 หลายเดือนก่อน

    Fantastic Tutorial! Looking forward to more from you.

  • @beyblader261
    @beyblader261 4 หลายเดือนก่อน +7

    I was thinking for the past month of trying this, but edge tpu's bandwidth plus my sceptism of any successful conversion to tflite held me back. Never knew pi's cpu was that capable. Anyways what inference speed (in like tokens per second) in mistral 7b approx?

  • @sentinelaenow4576
    @sentinelaenow4576 4 หลายเดือนก่อน +17

    This is absolutely fascinating! Thank you so much for sharing. It was just 1 year ago we were blown away by this multi billion dollar tech that now can run on a small raspberry pi. It's an amazing exploration you did here. Please continue for good.

  • @ZweiBein
    @ZweiBein 4 หลายเดือนก่อน

    Wow! Much appreciated, thanks for the video, subbed!! Keep it up.

  • @Khethatipet
    @Khethatipet 4 หลายเดือนก่อน +1

    I've been waiting for this video for months! 😆Thanks for putting it together!

  • @sudhamjayanthi
    @sudhamjayanthi 4 หลายเดือนก่อน +28

    nice video!
    a little correction at 10:41, privateGPT doesn't train a model on your documents but does something called RAG - basically, smartly searches through your docs to find the context relevant to your query and pass it on to the LLM for more factually correct answers!

    • @ArrtusMusic
      @ArrtusMusic 4 หลายเดือนก่อน +3

      Thanks, nerd.

    • @issair-man2449
      @issair-man2449 4 หลายเดือนก่อน +1

      I really like your explanation...
      could i also ask you, what does it do to the LLM? are the LLMs teachable or are they supposed to get trained on the information over and over until mastered?
      for example if i tell an LLM that 1+1 = 2
      will it remember it forever or do i need to repeat it many times?

    • @gigiosos1044
      @gigiosos1044 4 หลายเดือนก่อน

      ​​@@issair-man2449 I mean in theory they just generate a probable world one after another following the given prompt. The probability of the next word generated depends on the training database and on the "training setup" in general, once the model is trained, the "weights" that decide the probability of the word generated in a given context are fixed. So if you say to a model "from now on you live in a world where 1+1 = 3", it's probable that it will keep saying (in that conversation) that 1+1=3 because it's the most probable thing to say after you made that assertion. Btw if you wanna do a new conversation you will need to specify it again in the new prompt because usually the databases that are used to train LLMS contain data that says "1+1=2". Alternatively you could fine-tune the model (basically adding new data to train the model to respond in a certain way to a specific stimulus) in that way the "weights" will be modified and you'll end up with basically a (although slightly) different model.

    • @Kazekoge101
      @Kazekoge101 4 หลายเดือนก่อน

      Welcome, Douche.@@ArrtusMusic

    • @sudhamjayanthi
      @sudhamjayanthi 3 หลายเดือนก่อน +1

      @@issair-man2449 hey sorry just saw ur ques!
      so LLMs themselves do not actually remember any stuff (i.e no longterm memory) but there might be applications that are built leveraging techniques similar to RAG
      that being said, you can actually "fine-tune" a model with certain information or replying style to customize a LLM to your requirements which is usually a much more complex process

  • @RampagingCoder
    @RampagingCoder 4 หลายเดือนก่อน +10

    you should mention they are quantized and are pretty bad, not only that but they would take several minutes to reply vrs less than 10sec on a medium gpu

  • @AlwaysCensored-xp1be
    @AlwaysCensored-xp1be 4 หลายเดือนก่อน

    Great vid, been wanting to know this for ages.

  • @aaishikdutta290
    @aaishikdutta290 4 หลายเดือนก่อน +4

    What was the speed(tokens/sec) for these models have you recorded it somewhere?

  • @mwinsatt
    @mwinsatt 4 หลายเดือนก่อน +3

    Dude, you should design an end to end doomsday/prepper raspberry pi LLM machine! This is honestly such amazing work. It looks like you’ve already developed the scaffolding and initial prototype for this type of device.
    I wonder if you could build an all in one package with a reasonable cost that does local LLM inference with a larger model instead? That would be so awesome. One of the most useful features would probably be creating an easier method to input queries. I wonder if it’s feasible to use a speech to text model.

  • @garrettrinquest1605
    @garrettrinquest1605 4 หลายเดือนก่อน +7

    The fact that these can run on Raspberry Pi is crazy. I always assumed you needed a pretty beefy GPU to do any of this

    • @soldiernumberx8921
      @soldiernumberx8921 4 หลายเดือนก่อน

      GOOGLE did even make booster device for pi specifically to run AI model on it it like 99 dollars or something

    • @henson2k
      @henson2k 2 หลายเดือนก่อน +4

      If you ready to wait few hours or days models can run on pretty much anything

  • @Derick99
    @Derick99 4 หลายเดือนก่อน +5

    I watched this three times lol I love this. Thank you for this and Do you think the raspberry pi 5 is the best single board for the job or would the zimaboard compare just as good if not better
    Also since you had repurposed a wifi adapter would you have an idea how tear down old pcs and laptops and combine the hardware to create the vram needed for an upgrade like this? Probably more complex then it needs to be but got a whole bunch of old computers with junk processors and low ram etc to today's standards buy feel like you could repurpose alot of the stuff with a different board or if we just flashed windows off😂 and used the giant mother board and maybe even part of the laptop screen or something idk lol. A way to combine multiple processors or something to create a Frankenstein that works well lol
    Or another side project to make a control box for golf simulator. Basically just buttons to map to a keyboard and have a decent housing for the thing. Maybe your box Is for an arcade emulator box or something or controls your smart home or sound set up idk 🤷‍♂️

  • @eckee
    @eckee 4 หลายเดือนก่อน

    0:33 is the thing on your hand what's on the thumbnail? And does it have screen like in the thumbnail or was it an edit?

  • @olealgoritme6774
    @olealgoritme6774 4 หลายเดือนก่อน +8

    You can run the 13b models with 8GB RAM. Just add swap file in Linux of e.g 10GB. It's slower, but will still run with ollama and other variants.

    • @_JustBeingCasual
      @_JustBeingCasual 4 หลายเดือนก่อน +5

      that's a killer for your SSD/SD card.

    • @dannymac653
      @dannymac653 4 หลายเดือนก่อน +7

      @@_JustBeingCasual Digitally destroy a microSD card any% speedrun

    • @myname-mz3lo
      @myname-mz3lo 4 หลายเดือนก่อน +1

      or use something that has a gpu like jetson nano

  • @BCCBiz-dc5tg
    @BCCBiz-dc5tg 4 หลายเดือนก่อน +1

    Awesome Work! will be interesting to see how the new Mistral Ai's ~ GPT4 equivalent performs on Pi Edge Compute.

  • @AlwaysCensored-xp1be
    @AlwaysCensored-xp1be 2 หลายเดือนก่อน

    Got about 14 LLM's running on my Pi5. This is the vid that started my dive down tge AI rabbit hole. You can have multiple Ollama/LLMs running at once as long as only one is answering a prompt.

  • @GRITknox
    @GRITknox 4 หลายเดือนก่อน +1

    ollama looks like a very helpful pull ty for that. I’ve been looking for a couple weeks on training with coral tpu. Coral having so many dated dependencies breaks pip every time (for me, a dude who isn’t the smartest.) Next run at it will be w conda and optimum[exporters-tf].

  • @davidmiscavige5663
    @davidmiscavige5663 4 หลายเดือนก่อน +8

    Great video but huge shame you didn't show how long they each take to process, before responding....

    • @mhaustria
      @mhaustria 4 หลายเดือนก่อน +1

      great question

    • @henson2k
      @henson2k 2 หลายเดือนก่อน +2

      That would ruin a surprise...

  • @TigerPaw193
    @TigerPaw193 4 หลายเดือนก่อน +3

    Llama2, you missed one question. 7:21 the US president in 1952 was NOT Dwight David Eisenhower; it was Harry S. Truman. Eisenhower won the election in November 1952, and was then inaugurated on January 20,1953.

    • @fabianreidinger6456
      @fabianreidinger6456 4 หลายเดือนก่อน

      And Pérez wasn't president in 1980...

    • @ydhirsch
      @ydhirsch 4 หลายเดือนก่อน +3

      The clear lesson here is that this is software about credibility, not accuracy. It's just as smart as the not so smart sources on which it was trained, garbage in, garbage out. At least with Wikipedia, there are checks and balances of people with differing opinions having access to make corrections. Not so with LLMs. Fact checking costs extra.

  • @ernestuz
    @ernestuz 4 หลายเดือนก่อน +1

    I think your description of how PrivateGPT works is wrong, I think it stores the texts in a vector DB and then uses a different model to check the DB with your prompt, the DB returns some text that is injected in the context with your prompt, using the model that you have chosen. Please correct me if I am wrong, I just had a quick look at the sources.

  • @AlexAutrey
    @AlexAutrey 4 หลายเดือนก่อน +6

    It would be slower but I'm curious if setting up ZRAM or increasing the cache size with an SSD or NVME drive might be what's needed to run the larger language models.

    • @ferdinand.keller
      @ferdinand.keller 4 หลายเดือนก่อน +4

      You can run them that way, the issue is that for each request you would have to wait for tens of minutes. I tried really big models and you can’t call it chatting anymore.

    • @myname-mz3lo
      @myname-mz3lo 4 หลายเดือนก่อน

      use a nvidia jetson nano or something similar . the gpu is waaay better for running llm's

  • @MrMehrd
    @MrMehrd 4 หลายเดือนก่อน

    Thx brow i was looking for something like that

  • @holygxd-
    @holygxd- 4 หลายเดือนก่อน

    Thank you :) I love youre content Data Slayer :)

  • @ericgather2435
    @ericgather2435 4 หลายเดือนก่อน

    Funny enough i was looking for this today nice !

  • @flatujalok
    @flatujalok 4 หลายเดือนก่อน +3

    Is it just me, or is bro recording this while a little baked?

  • @casady100
    @casady100 4 หลายเดือนก่อน +1

    What is the case with the monochrome screen with text displayed. How do you do that??

  • @whitneydesignlabs8738
    @whitneydesignlabs8738 4 หลายเดือนก่อน +16

    Thanks for the video. I have also been experimenting with various LLMs on the Pi5, locally. Have best results with Ollama so far. I am also running these pis on battery power for robotic, mobile use. I am pretty close to successfully integrating local speech to text, LLM & text to speech using 2 pi5s, including animatronics. Fun stuff.

    • @raplapla9329
      @raplapla9329 4 หลายเดือนก่อน +2

      which STT model are you using? whisper?

    • @evanhiatt9755
      @evanhiatt9755 4 หลายเดือนก่อน

      I am literally dreaming about doing this right now. I have a pi5 on the way. Let me know how it goes!

    • @ChrisS-oo6fl
      @ChrisS-oo6fl 4 หลายเดือนก่อน +1

      Most guys running LLMa with hone assist Handel all the voice recognition and text speech on via the PI so and the LLM off from a local api so I’d assume two PI’s would run fine. Not sure pd ever play with a base model or even a restrained model though. There’s plenty of dope 7B models available including unaligned models.

    • @whitneydesignlabs8738
      @whitneydesignlabs8738 4 หลายเดือนก่อน

      I actually run a ping every 60 seconds, and when Internet is available, I run some APIs, but when it is not available, it falls back to local. So for stt, I using Google's free service when Internet is available and will use whisper when no Internet. Whisper is actually one my the steps I have not installed yet. But will soon. The Google stt is working. Also using Eleven Labs API and pytts3x the same way for tts. (Internet/no Internet) This part is working and tested. Same with the LLM, locally working and tested. A pi5 handles local LLM (its only job), a Pi4 handles speech in and out, plus simple animatronics. Another Pi5 manages overall operations and runs MQTT server. All communicate the message data over MQTT messages on the robot's internal wifi. @@raplapla9329

    • @whitneydesignlabs8738
      @whitneydesignlabs8738 4 หลายเดือนก่อน

      Interesting! I also have a Pi3 running Home Assistant, with plans to integrate it into the robot architecture. My current issue with Home Assistant is I can't seem to get Node Red installed onto it. The robot uses Node Red, and I would love to make use of the GPIO function in Node Red on Home Assistant with all this. But stuck on Node Red...@@ChrisS-oo6fl

  • @samjco
    @samjco 4 หลายเดือนก่อน +1

    So, I've installed privGPT on a gaming laptop having an RTX 4060, and it worked. Speed was so so even after enabling the LLM to use the gpu instead of cpu. I'd be interested in knowing which configuration yields the fastest response. I've seen pcie to m2, which enabled the use of an external gpu, because gpus process ai data faster than cpus Ive heard. What is the best hardware combination would you recommend for speed and portibility?

    • @mhaustria
      @mhaustria 4 หลายเดือนก่อน

      same here. i7 13700kf, 4060ti 16gb, 160gb of ram. privateGPT on cpu is pretty slow, with cuda enabled it runs good, but not as fast as on this pi5? So what's the magic here?

    • @sitedel
      @sitedel 4 หลายเดือนก่อน +7

      Simple answer: the video has been accelerated.

  • @TobiasWeg
    @TobiasWeg 4 หลายเดือนก่อน +2

    Hm, in Ollama there seems a tendency use way less ram, then the model should actually use. Or at least Htop did not seem to pick up on a substancial in crease in memory use one would expect from loading like 7B model.
    Can anybody explain why?
    I saw the same for Mixtral on my Laptop, it did just run even so the RAM only was occupied with about 3.7 GB instead of the 30GB that would be expected.

  • @_IamKnight
    @_IamKnight 4 หลายเดือนก่อน

    Would I be able to run the mistral 7b, as a casual chatbot with short responses(with 4 second max time to first token or so) on desktop using the coral ai usb accelerator or even two?
    If you could test it on your setup, Id be very thankful. Pls respond :>

  • @mbunds
    @mbunds 4 หลายเดือนก่อน +3

    It would be fascinating to work out a way to cause multiple small edge computers hosting LLMs to work in synchrony. A cluster of Pi 5 SBC's could narrow the memory gap required to run larger models, providing more accurate responses if not measurably better performance. There would be a lot of tradeoffs for sure, since the bulk of these currently seem to be created to run within a monolithic structure (composed of massively parallel hardware GPUs) which does not lend itself as well to "node-based" distributed computing on consumer-grade processing and networking hardware, so I wonder if the traffic running across the network meshing multiple processors would create bottlenecks, and if these could operate on a common data store to eliminate attempts to "parse" and distribute training data among nodes?
    I have the feeling that the next step toward AGI will involve using generative models in "reflective layers" anyway, using adversarial models to temper and cross-check responses before they are submitted for output, and perhaps others "tuned to hallucinate" to form a primitive "imagination", which perhaps could form the foundation for "synthesizing" new "ideas", for deep analysis and cross-checking of assumed "inferences", and potentially for providing "insights" toward problem solving where current models fall short.
    As one of my favorite TH-cam white-paper PHDs always says, "What a time to be alive!"
    Thanks for a great production!

    • @peterdagrape
      @peterdagrape 4 หลายเดือนก่อน

      Problem is the maximum bandwidth, these models basically need a crap ton of ram, and sharing a model across multiple pis is very difficult though not impossible

    • @mbunds
      @mbunds 4 หลายเดือนก่อน

      @@peterdagrape Got it; diminishing returns. With so many Pi cluster configurations out there, I figured there was a reason the Pi people weren't all over this already.

  • @davocc2405
    @davocc2405 4 หลายเดือนก่อน +2

    Does anyone know of a FOSS text-to-voice engine that would generate speech closer to the quality of the engine he's using here but on a locally hosted engine only? I use e-speak to verbally announce messages (I use a combination of MQTT and a script that reads messages) so I can set a job and forget it - this helps me to avoid setting things off and forgetting to check on them which I invariably do constantly.

    • @leucome
      @leucome 4 หลายเดือนก่อน +1

      Coqui TTS, Piper, speachT5.

  • @Skymack351
    @Skymack351 2 หลายเดือนก่อน

    Now, imagine these programs running on cellphones! I think we're not very far out from it!

  • @GearAddict90210
    @GearAddict90210 4 หลายเดือนก่อน +3

    Thank you for sharing this information, it is great to have a local llm and it was quite easy to set up after all.
    I did not know that there are so many models available.

    • @myname-mz3lo
      @myname-mz3lo 4 หลายเดือนก่อน

      especially having an uncensored llm . those might even be illegal one day because of their power

  • @fintech1378
    @fintech1378 4 หลายเดือนก่อน

    whats good logic to use Llava if we do surveillance video processing instead of image

  • @randomcreation1611
    @randomcreation1611 หลายเดือนก่อน

    Wow super edit . Speed increased when you ask question.
    We are not fools , You are black handed by the "uptime"😂

  • @AndreBarbosaPC
    @AndreBarbosaPC 2 หลายเดือนก่อน +1

    Awesome video! Thank you ;-) My question is... How could we run these local LLMs locally and at the same time having them accessing the internet to search stuff that they do not have it?

  • @erdemguner
    @erdemguner 4 หลายเดือนก่อน

    Which recording tool you are using?

  • @nathanbollman
    @nathanbollman 4 หลายเดือนก่อน +6

    the memory doesent line up with the models you are loading, Im not seeing any changes on your memory when swapping models... I assume these are gguf models? and they appear to be running faster that what a rpi5 is capable of...

    • @AustinMark
      @AustinMark 4 หลายเดือนก่อน

      but if you buy the $6 guide, all will be explained

  • @jobasti
    @jobasti 2 หลายเดือนก่อน

    How can i use that LLM and the RASPI to use my own LLM in my IDE? Is there already something that can read my code and help me program, based on the code?

  • @erniea5843
    @erniea5843 4 หลายเดือนก่อน +1

    Wow, never thought a Pi could perform. I was thinking of trying this with a Jetson

  • @EliahHoliday
    @EliahHoliday 4 หลายเดือนก่อน

    This looks like a project worth exploring. Although the limitation of AI is that it sources data accumulated on internet and so is subject to biases which leads to inaccuracies. I'm sure however that there would possibly be a way to clean up data for accuracy if another unbiased reference was easily available.

  • @MovieTank-2002
    @MovieTank-2002 3 หลายเดือนก่อน

    at the start of video you have been using a terminal with ports and other details mentioned of Pi5, How you did it? I am new to raspberry pi? want to know if it is a software or something else?

    • @willfettu2747
      @willfettu2747 29 วันที่ผ่านมา

      enable SSH on your Raspberry Pi, then SSH over Port 22 from your laptop to the Raspberry Pi. The Terminal you're seeing is that he's controlling the RPI from his laptop.

  • @Technicallyaddicted
    @Technicallyaddicted 4 หลายเดือนก่อน

    Would it be better to run 4x pi4 8gb compute modules in a cluster, or one Nvidia jetson Xavier? Both cost about the same, but the Xavier is built for AI. This seems like a black and white question, but the more you dig, the more the grey it becomes. Let me rephrase: what is the cheapest computer I can buy to get great LLM and tensor performance using localized AI that requires zero internet? I have a budget of $500. Please help me. I really need the advice.

  • @hackandtech24
    @hackandtech24 4 หลายเดือนก่อน

    How could we use a bunch of raspbery pi clusters with fast memory parallelized to run mixtral 8x7b? Is that even possible?

  • @user-ps9gq4jn9r
    @user-ps9gq4jn9r 2 หลายเดือนก่อน

    Hi sir, I am making a similar project where i'm using a rasberry pi to awnser questions on a py file using python, but its having a voice rsponse to the quesiotn. Im having problems making it work because its having errors with the alsa thing n ot being located or smth. Could we get in contact and you can help me with it please? thanks.

  • @that_guy1211
    @that_guy1211 4 หลายเดือนก่อน

    Does it need to be on a raspberry pi or a linux based system? I'm interested in running these models in my windows system or even over WSL 2, if it is possible, i'd like some feedback on the possibilities of you making a video on it

    • @jsmythib
      @jsmythib 3 หลายเดือนก่อน

      Lm Studio? A way to run tons of LLM's in windows.

  • @sarooprince
    @sarooprince 4 หลายเดือนก่อน

    Hi got some questions around the content in the course where can we contact you?

  • @FlyingPhilUK
    @FlyingPhilUK 4 หลายเดือนก่อน

    How does this comparing to something like an nVidia Orin?

  • @Krebzonide
    @Krebzonide 4 หลายเดือนก่อน

    Can this do any kind of image generation stuff like stable diffusion?

  • @ThomasConover
    @ThomasConover 4 หลายเดือนก่อน

    The scientific revolution in the area of advanced mathematics and algorithms is just amazing these days. ❤❤❤

  • @phamngocson-do5go
    @phamngocson-do5go 4 หลายเดือนก่อน +1

    I want to know about the latency. Can it be fast enough for a real-time conversation?

  • @jonathanbutler6635
    @jonathanbutler6635 4 หลายเดือนก่อน +2

    Can you add the coral AI m.2 accelerator to the pi 5 and test it yet?

  • @dayhta
    @dayhta 4 หลายเดือนก่อน +1

    Alright youve got me. The CPU joke at the beginning hooked me but this technical in the video has definitely inspired current projects im working on. Ive gotten to the self hosted AI part I just want to see if there is some efficiencies I can put in so this can be lightweight and used in applications.
    Subscribing for the hydro crypto miner and future projects!
    Thanks Data slayer!

  • @jawadmansoor6064
    @jawadmansoor6064 4 หลายเดือนก่อน +4

    what is output token speed? tokens per seconds on rpi5?

    • @gaspardbos
      @gaspardbos 4 หลายเดือนก่อน

      Yeah, my question also. If I use some of these 7B models on my M1 the tokens/sec is just not fast enough that I don't want to resort to using a model behind an API (which is faster) for things like coding. Still excited for other more data and privacy sensitive use cases or where latency is permissive to run them on my Pi. 8GB versions were sold out, last I checked...

  • @user-vl4vo2vz4f
    @user-vl4vo2vz4f 4 หลายเดือนก่อน

    simply brilliant 😮

  • @SmirkInvestigator
    @SmirkInvestigator 4 หลายเดือนก่อน

    What's the case with the screen? Is that for RB5?

  • @michaelzumpano7318
    @michaelzumpano7318 4 หลายเดือนก่อน +3

    Oh, that’s just awesome. Edge AI. Just confirm if you would… the google voice was not generated in real time with an webhook or API, right?

    • @nash......
      @nash...... 4 หลายเดือนก่อน +1

      No, generation time was very slow. That had to have been put together in post production

  • @dreamofeternalhappiness8001
    @dreamofeternalhappiness8001 4 หลายเดือนก่อน +1

    👾 Morning coffee tastes great while learning useful things. I express my thankfulness for the important video.

  • @armisis
    @armisis 17 ชั่วโมงที่ผ่านมา

    I want to do this use coral usb ontop with webcam object and face recognition, voice interaction, and link it to be able to control my home by accessing my existing home assistant raspberry pi 5 device.

  • @Baowser210
    @Baowser210 2 หลายเดือนก่อน

    What power detector is that ?

  • @ChrisS-oo6fl
    @ChrisS-oo6fl 4 หลายเดือนก่อน

    Running certain models locally is extremely slow for my laptop. I wondered if the pi could it. I figured someone already tried with local AI or oobabooga.
    But I’m very confused why you didn’t try any really good 7B uncensored models. If your gonna run a local LLM why would anyone want a censored / aligned or base model? Can you list the ones you tried with success?

  • @snopz
    @snopz 4 หลายเดือนก่อน +1

    It would be useful if we can add more ram to the pi5's m.2 slot so we can run the 13B models

  • @tlubben972
    @tlubben972 4 หลายเดือนก่อน

    Was the tiny Llama not out yet? Thinking about doing it with that

  • @madwilliamflint
    @madwilliamflint 4 หลายเดือนก่อน +1

    This is game changing. I love your stuff. But man it sounds like you're falling asleep in the middle of your video.

  • @paulocsouzajr8241
    @paulocsouzajr8241 4 หลายเดือนก่อน

    Is there any "How to" or maybe a "Step-by-step"? I have a RaspberryPi 3B+ and an useless OrangePi A20... Is it possible to use them any way?
    Congrats for the great job!!

  • @JeremyJanzen
    @JeremyJanzen 4 หลายเดือนก่อน +2

    Looks like this was running fully on CPU. Can this workload not run on the Pi GPU?

    • @Hardwareai
      @Hardwareai 4 หลายเดือนก่อน +1

      Possible with clBLAS, but won't be faster. Can offload the CPU at best

  • @Philip8888888
    @Philip8888888 4 หลายเดือนก่อน

    Does this use CPU or GPU on RPI5?

  • @WINTERMUTE_AI
    @WINTERMUTE_AI 4 หลายเดือนก่อน

    Will this work on a CM4 8GB board?

  • @user-jw8sk4vz9x
    @user-jw8sk4vz9x 4 หลายเดือนก่อน +1

    Hey, bro! Could you please make a video on how to install this? I'd really appreciate it!

  • @cfg83
    @cfg83 2 หลายเดือนก่อน

    Woo hoo! I just got a Pi 5 @ 8GB and I was wondering what to do with it. I wonder no more!!!

  • @antony950425
    @antony950425 4 หลายเดือนก่อน

    What’s the terminal software?

  • @dilboteabaggins
    @dilboteabaggins 2 หลายเดือนก่อน

    This presents a very intetesting use case.
    Is it possible to feed technical manuals into one these models, and then ask them specific questions about the content of the manuals?
    It would be really neat if you could take a picture of an error code from a machine, send that pic to the AI model and then have it provide information about the errors or faults

  • @delawarepilot
    @delawarepilot 4 หลายเดือนก่อน +1

    So you are saying you invented an offline encyclopedia, we’ve come full circle.

  • @ronitlenka2508
    @ronitlenka2508 4 หลายเดือนก่อน

    Hey, can you make a voice assistant with CHATGPT 3.5 using in RAS pi Zero and battery for giving power. Device will portable and easy to carry.

  • @birkinsornberger263
    @birkinsornberger263 22 วันที่ผ่านมา

    Look at 7:39 - Top right of the screen you can see "Uptime:", the jump is about 36 seconds. That seems like a pretty long wait time to me.

  • @drewsipher
    @drewsipher 4 หลายเดือนก่อน +4

    Thanks for the video. It was interesting to see what the pi5 can do.
    I do think, however, it's a huge mistake and misinformation to say that LLM's contain any of the information it was trained on. The models are trained to finish a sentence, to guess what the next word is, and do not contain any of the actual training data. I feel like this is important, so that we know how to trust LLM's properly.

    • @bakedbeings
      @bakedbeings 4 หลายเดือนก่อน +1

      LLMs, including GPT, do memorise things, though they're not built for directly for that purpose. Try entering this into chatgpt:
      Finish this sentence: "As Mike Tyson once said, "
      You can guess what it responds. If you have strong weights for words in a sequence that match a given article or quote - that appears thousands of times across the web/training data - it's effectively memorised. Look into the nytimes lawsuit.

    • @theaudiocrat
      @theaudiocrat 4 หลายเดือนก่อน

      You can absolutely get models to spit out training data with text completion and the right parameters. In fact, most "censored" models will even give up the "censored" bad-think ideas that they're not supposed to give you when you know how to prompt them to do so, and you already kinda touched on the reason WHY you can do it

    • @drewsipher
      @drewsipher 4 หลายเดือนก่อน

      ​@@bakedbeings You're right that you can extract knowledge (sort of the whole point of an LLM). I only mean to highlight the differences between how a models "remember" things. Its closer to how humans remember, than actual computer memory. There is also a random seed for most models that can change the output.

  • @etyrnal
    @etyrnal 3 หลายเดือนก่อน

    there are a bunch of nvme hats out there, but a lot of people are having problems getting them to work. issues with booting, recognizing, boot order, compatibility problems etc.

  • @jiahaochen4117
    @jiahaochen4117 4 หลายเดือนก่อน

    What’s the shell name? Where can I buy it?🎉🎉 it’s so fascinating.

  • @midwinter78
    @midwinter78 4 หลายเดือนก่อน

    The speed is *perfect*. Now run it on a green CRT and give those little sound effects as the words come out at reading speed and it'll be just like being in a Hollywood movie.

  • @user-jy5jh9hn9h
    @user-jy5jh9hn9h 4 หลายเดือนก่อน +3

    i actually was thinking abt putting a model on a raspberry pi, looks like you beat me to it, but what abt putting the raspberry pie on a drone and getting the AI to fly it???

    • @Simon-qe8ph
      @Simon-qe8ph 4 หลายเดือนก่อน

      I was thinking the same!😂

    • @user-jy5jh9hn9h
      @user-jy5jh9hn9h 4 หลายเดือนก่อน

      I know imagine having like this drone AI army that you can command its kinda like jarvis when Tony told him to send all the iron man suits in iron man 3

  • @guitarbuddha74
    @guitarbuddha74 4 หลายเดือนก่อน

    rmdir won't delete directories recursively that way btw. You also don't need root if you own the empty dir. At least you can try it easily enough

  • @user-sz3cs6nj5q
    @user-sz3cs6nj5q 4 หลายเดือนก่อน

    Dude LLMs is great to see on small boards, any possibility of running AI img gen using stable diffuse at least running base models

  • @alpineflauge909
    @alpineflauge909 4 หลายเดือนก่อน

    world class content

  • @ex1tium
    @ex1tium 4 หลายเดือนก่อน

    I have two RPi5 8GB models and RPi4 currently sitting on my desk. Do you know if it's possible to have some sort of cluster with them for LLM computing? I've been playing with local LLM's on my PC mainly for software development but running some smaller LLM's could be cool given the power efficiency of RPi's.
    //edit Oh you addressed the clustering stuff later in the video.

  • @killermouse0
    @killermouse0 4 หลายเดือนก่อน

    Super inspiring! I had no idea it was so simple to get this running locally. Amazing!

  • @irondragon06
    @irondragon06 2 หลายเดือนก่อน

    Code llama had a misleading answer for async/await. It made it seem as if JavaScript was the first use of the syntax in 2017 and then other languages like c# adopted it. C# had async/await in 2012 I think which derived from 2010 f#. Also I believe python had the feature in 2015

  • @rickt1866
    @rickt1866 4 หลายเดือนก่อน

    Hey try and run it on NVIDIA Jetson Orin Nano Developer Kit? or jetson? from my understanding its optimized for/built with AI in mind.

  • @dontworry7127
    @dontworry7127 4 หลายเดือนก่อน

    Tested it with Google Coral USB Accelerator and camera?

  • @ewanp1396
    @ewanp1396 4 หลายเดือนก่อน

    Isn't privategpt doing RAG rather than actually doing any training?

  • @BxKRs
    @BxKRs 4 หลายเดือนก่อน

    After install, when I try to run any of the models I just get “No such file or directory”

  • @fire17102
    @fire17102 4 หลายเดือนก่อน

    Yo Data Slayer, make the screen from the thumbnail real (+mic) and do a Rabbit R1 competitor. Looks dope

  • @sprinteroptions9490
    @sprinteroptions9490 4 หลายเดือนก่อน

    How long did it take for the LLaVa to return results from the selfie? Lotta use cases there alone. imagine you're a spy looking for a particular person. you're walking around in public with your llava lora model taking a pic a second. neat

    • @sprinteroptions9490
      @sprinteroptions9490 4 หลายเดือนก่อน

      you're also keep about power draw.. how long could a spy walk around town taking as many pics as possible drawing on a couple of cheap powerbanks? just spitballing. subscribed.

  • @guinea_horn
    @guinea_horn 4 หลายเดือนก่อน

    How do I dislike only the Egyptian cotton joke but like the rest of the video

  • @CilantroSativum
    @CilantroSativum 4 หลายเดือนก่อน

    I have a old laptop with 4bg ram and core i3 processor , can i have my own AI running in this machine , a kind of offline AI , Thanks

  • @dr.mikeybee
    @dr.mikeybee 2 หลายเดือนก่อน

    I don't think GPT-4's parameters are in the same model. GPT-4 is a mixture of models. I suspect that although they run on gigantic arrays, I suspect individual models run on single H100.

  • @madmax2069
    @madmax2069 3 หลายเดือนก่อน

    Id like to see one of those ADLINK Pocket AI external setups running on a Pi and see how that effects AI performance