Conda (anaconda or miniconda) resolves python dependency issues by creating a separate environment with the specified version of python, and isolates the environment along with anything loaded into the environment related to Python (ie if you install with pip or conda, it stays with the environment and leaves the systems python alone) This allows me to use alpha versions of ubuntu with any python programs i want by having separated environments for each program
Yeah putting things in a virtual environment works, but pycoral still has some issues with their pycoral.adapters module. I have a raspberry pi 5 and a pi version 3 camera, still having issues when using venv
The Coral TPU m.2 / mini-PCIe version itself is PCIe 2.0 x1 aka 500MB/s, so theoretically it's "slower" than the USB3 version in every case. Obviously the answer isn't as simple as that, but saying the PCIe version is faster because USB3 bandwith is only 625MB/s isn't the right answer either. You have to take into account various overheads, driver optimizations, etc...
the dual edge version needs a pcie x1 switch since I dont think there is a system that can bifurcate an x4 connection to two x1 lanes. so the existence of the dual edge version tells me that it was designed to share a single x1 lane with two x1 tpu devices. meaning 500MB/s is probably enough for two tpu dives in a real world application.
Person detection. Training the AI to hunt us more efficiently. lol This is an amazing little thing. I do have an issue with the AI online. Having a local device that can be use to monitor and do what it has to do on an offline local network is amazing.
Did you test to see if the USB bus speed is causing a bottleneck as is the premise of the video? 600MB/s is a huge amount of image data to pass over to the accelerator. Is the bus latency an issue? Would be nice to see a use case that shows where the “faster” you mention matters.
The speed of USB on the raspberry pi is limited by the PCIe bus connection, which is one-lane PCIe 2.0, so 5GT/s which translates in roughly 600MB/s _theoretical_ maximum. In practice, it's much lower. The PCIe lane exposed on the raspberry pi 5 is also PCIe 2.0, although you can configure it to PCIe 3.0 (not officially supported) which gives you 10GT/s. It could be faster in the sense that PCIe has lower latency, although I don't really know if that would have much of an effect in the benchmarks. A dual coral TPU would not work, because that uses two PCIe 2.0 lanes. USB on the Pi 5 should have much more throughput though, since it goes through the south bridge and gets 4 PCIe 2.0 lanes. So I wonder if PCIe vs USB makes that much of a difference, except when it comes to latency maybe. You'd have to test it, It'd really depends on whether you are bandwidth or compute restrained
For starters he's using the m2 A+E coral not the mini PCIe.. not that it matters much I believe they both have pcie 2.0 x1 bandwidth. So basically less than the USB throughput which is a theoretical throughput.. so it could all be a wash and irrelevant
So you state "cheap" but fail to cover the actual costs of all the hardware! 'sup with that? I'd at least expect that in the description, or a pinned comment, or covered in the video and a summary at the end with some chapter markers. Pretty please? I did the work for you: USD$119.90 to $199.90 - Zima Board USD$64 to $96 - Zima Blade, dual-core and quad-core, respectively (unsure of availability) USD$24.99 - Coral AI Mini PCIe Accelerator USD$10 - Mini PCIe Adapter Card (eBay) So Cheap = about USD$125 for the Blade, AI, Adapter, and $25 for power supply, wires, etc (maybe less) Questionable Availability though Still cool, but holy hell you really need to be a BOFH SYSOP to get this working. NOT EASY
I found that he did a more "SHOWING OFF" video rather than making a good educational video he glimpse over many subject without even taking time to explain what he doing and that poor montage and jumping from video shot to another is so annoying
Local AI just got easy and cheap.... once you forkout for google proprietary hardware and do some diy hackery.... and it really only has one use case that if you're any bit security conscious you won't want anything google related having access to... fantastic.
Pretty sure you leaked your public IP address in your router webpage at around 3 mins in. Better blur it away to be safe. This project is novel to me, just that the view screen within MacOS is quite faint when it moves along with the mouse pointer, especially in 9:54.
Suggestion: use an IP camera to lessen the complexity and length of your video and revise your video, most viewers who want detection will be using IP cameras, and leave the rtsp magic for a separate video. While I used 13 Reolinks cameras, beware that Reolink uses LIVE555 for its rtsp server and earlier cheaper models generate problem packets. I waited almost a year to get my hands on some Coral USBs, it's good to see the M2s available and that certainly expands my options. Thank you.
You seem to know a fair bit about security cameras - might I pick your brain for info? What protocol do they use? Is there opensource software that can make use of any/all security camera makes/models? Can just any security camera be managed this way or does each brand have their own proprietary protocols?
Love the video! The audio is a little wonky at points. Loud and then quiet then REALLY loud then quiet. Hurt my ears with my earbuds in. Thanks for the info!
so freaking cool, if you were to deploy a setup like this would you stick with the tpu person detection, or would you opt more for a general purpose motion detection. that would pick up cars, dogs, people, birds, or whatever, and wouldn't require a designated tpu
You do realize that the Raspberry Pi 5 PCIe is only PCIe 2.0 and x1 lane right? That has a maximum theoretical throughput of 500 MB/s, which is less than the USB version of the Coral TPU... So when using a Pi 5, the USB Coral would absolutely be the better choice.
You could of course have pulled out your trusty old PC, opened it up and had several PCI slots available. Would it work to have a nmber of these in parallel I wonder...? A mining rig...
Good thing you have in your hand think 2~3 years ago went to buy a older Coral w/ priced $60 nxt thing every direction scooped up & vendors were selling’ like @ $190 even $300 even if one could afford it was they were hard find a/ lead time was spaced like months for the next available batch {same thing w/ the older Versions of Raspberries
Yes, hardware is one thing, but Google's software support is lacking. I had my own struggles with it. NVIDIA CUDA is supported much better. Plus, with Google, you never know when they'll kill it off and leave you high and dry. I can't recommend it.
What would be a cool graduation project using some other APIs and LLMs along with Coral? I was thinking face recognition and auto lock based on face but I would like to hear your take on! Thanks for inspiring us to do it our selves! I hope one day I can start a tech community like yours!
Does not seem like it's enough to me.. all that already exists today, think of quantized NNs and very basic hardware. there is no scientific insight to that unless you compare it to existing systems and evaluate different metrics of the device etc
LLM's on a TPU is like trying to sew with oven mitts on..You *could* run the mid tier LLAMA llm if you added some extra ram to an SBC or mini PC. Cloud based ML doesn't really speak to the Coral AI's value proposition (edge AI). The coral TPU's are cheap they can process hundreds of frames a second, you could make a state of the art surveillance system for under $100 bucks. The coral runs TF lite and Google has a number of pre-trained models that you could have some fun with.
I personally think that developers should be focusing on making AI have lower requirements. It would be amazing if my $300 phone could run LLM's locally. That could also be amazing for things like servers, where they could exponentially increase their maximum workload.
the annoying thing with google tensorflow is how often stuff is broken with new releases. Over the last 20 years, I've lost count of how many times Google decided to break backward compatibility in their projects. Google's idea of open source kinda blows.
It's all their software really. They just don't care, the have the attention span of a toddler. Microsoft is as evil as Google but at least they make good dev tools.
"This is awesome! I was working with YOLO and weapon detection and was thinking, 'If only this solution were cheaper and more portable, just like your solution.' It would save precious minutes by preemptively invoking a lockdown and alerting emergency services. I believe it's solutions like this that will make a world of difference in the future of technology. We definitely need to be careful about how and when it's used as well, and to create a digital framework for responsible use. However, I believe there are places where its use is warranted, like in schools, for instance.
They actually have AI scanners now. You can see them at many stadiums and convention centers. They’re two posts with a bunch of cameras and sensors and it uses AI to process the scan of your body and detect weapons. Off the top of my head I believe it’s a similar scanner as the scanners in the airport where you raise your hands, except you simply walk through this one
It's 2024 and Linux still makes people use a dos prompt for things like a caveman. Hey look I'm dialing up WOPR with my 2400 baud modem in my terminal program.
I use Wyze cameras running a hacked bios so they're dumb & stay local, not sending my video & audio to China first before letting me see it. And I have them set up in the backyard to make sure there are no Bears, Mountain Lions, Raccoons, Skunks, etc before I let my dogs out in the middle of the night to pee. But if you had young kids this would be a MUST to check for animals & Disney employees before letting them out in the middle of the night to pee.
The usb one isn't only slower because the interface, it's also underpowered. The one you got isn't the greatest either, you should have got the dual edge which is by far the best price:performance ratio
@wombatillo yea but driver support for most things ARM is severely lacking. And from my understanding RPI, being the device with the largest community, would get specific driver support before anything else. X86 is way more likely to have compatible driver support just on the basis that everything is already running on x86 hardware.
Not in community support. And dont forget the rpi was intended as a singleboard computer for thinkerers/learning. I know its not not seemingly pushed that way but its not meant as a powerfull arm computef
Got it working as a HA add-on, smokes my Amcrest doorbell sensor. "Chihuahua Pitbull mix detected, 109% certainty". Seriously though, wondering when GPU support will work.
At 4:06, the special file under /dev will only appear when you connect the device. This command does not check for the driver module. The appropriate commamd would be something like modinfo
Hi there, nice video! I really thought these things were awfully more expensive than they actually are. However, what's your take on using these things for larger models? I mean, it's my understanding these things do NOT have their own memory, so they'd be sequentially exchanging data with the main computer memory for each (or every few) operations, so they are BOUND to be as slow and high latency, having only those PCIe lanes available, aren't they? Say i'd like to run even just a 7b llama, or something in the low hundrends of mb model, like MarianMT translation models, this would be effectively useless, at least for real time processing wouldn't it? Also, and this could be a real bonus for this little things, can they be used *simultaneously* for running any 1 model? I mean could I eventually plug 3 or 4 of these things in one single machine and use them for inference of a given model at 12 or 16 TOPS?
I don’t think LLMs are the best use case for edge computing. Basically LLMs require very little data streaming (literally a sentence to a few paragraphs of text) and the latency of a response is tolerable (theoretically it would only take less than a second to send text to the cloud and recieve the text if the GPTs could run instantly). This discounts privacy concerns, but you could also hook up your own big stationary GPU and send data to your house instead of the cloud the same way. Anyways, the point is you want to think about edge computing for things where the upload time for the data to be processed would greater than the time for the small gpu to actually do the work. That’s like video, lidar, and audio streams. Another interesting case is when you kinda blur the lines. You use the local mini gpu to embed the live stream so you can send the smaller embeddings to the cloud for a more complex model that requires a big gpu. Basically edge devices act as data compressors to reduce the necessary upload bandwidth.
You can see the AI bits demonstrated at around 12:30 - it's running inferences on a TensorFlow model to perform object detection. You can detect cars, people, cats, dogs, etc. in custom zones throughout the frame. He made it a bit more work to install than it has to be (though I'd argue that a 13 minute video doesn't constitute "very hard to set up"). It would be easier using `docker compose` (as opposed to `docker run`, and making it work with a webcam was extra work. And it's not just a Ring camera, it's a Ring camera that doesn't send your video to the cloud, doesn't harvest your data, works even when your Internet is down, gives you full control, etc. It's very powerful and useful.
I'm really confused. Your title is Local AI JUST got Easy. I've been using a Coral TPU with Frigate exactly like this since August 2021 - literally 2 and quarter years before you made this video and I wasn't a cutting edge user by any means. This really isn't new or any different to what we've had for a long time.
That the Coral TPU only works with older Python version is a big disadvantage... I struggle with it also, since I am very accustomed to using new functionality of python 3.10-3.12
It’s not ideal but you could always have your Coral code act as a local server running in a separate process and have your main logic run using the newer python. This also gets around the global interpreter lock
ZOMG not you using docker for the frigate server and not containing the MQTT in the same compose.... zomg that just made me want to defenestrate myself.
How usable or sensible is this hardware/software to run an LLM (e.g. LLAMA with voice assistant? I have the idea to use it for Assist in HomeAssistant (so I don't need the openAI API, which can get expensive if used often)
I would like to pass the analysis of video streams from some ip cameras to a python program of mine that exploits some public models and others written by me to determine who is in the house, (face recognition), where exactly they are (in the 2d space of the floor plan) and what they are doing (hand and gesture recognition). With 5-6 cameras I should be able to dispense with dozens of motion sensors, occupancy sensors, brightness sensors, doors/windows sensors and at the same time have a better understanding of what is going on in the house and command home automation accordingly. What is the best hardware infrastructure to implement something like this?
There's something there on your lips. You might wanna take care of that On a side note, followed from its release. You can tell how popular a device is or a motorcycle, whatever its community size, specific 3rd party devices, etc Might as well be a bottle of Zima
My goal is to make a llm one day but i dont have a good enough graphics card to handle llm of higher sizes. Can you give me an idea on what you would doo for strictly an LLM but one that can handle a 70b llm Ideas for small but powerfull sets or soemthing even specific for that would be cool. Was ready to buy a zima blade and board and coral thing but i want the best without buying an a100
I ended up going with Libre 3 flavors the top one fully blows away Raspberry 4 when pricing gets even close to $100 for every item it’s like auto eject as hobo happy hobby researching…
I'm curious if it's possible to use the pcie version via USB with a simple adapter. I understand that it might affect speeds but could it at least hypothetically function? I'm trying to research this ATM. Also the USB version is about $30 more than the USB version so... ya know. Pinching pennies where I can and all that,
I'm curious whether the tensor flow models are updatable. Ie, could an enterprising person add a model which detects an approaching person (vs departing)... same with a vehicle.
using ffmpeg just like that is .... dumb. You need serious reencoding performance *AND* you need to crank the settings as high as possible - so that the image quality is decent so that the inference can make actual f*cking guesses and doesnt miss all the time because you're sending a pixelated 320x240-mess. *OR* : _(hear me out)_ you could use a proper webcam or surveillance cam which pumps the stream out itself.
Suddenly, someone needed to sell a bunch of slow 3 year old inference chips with very limited use. This is their story. An RTX4090 is more energy efficient than an equivalent number of these things assuming no sparsity @ 660 INT8 TOPS for ~380W (ignoring the dual Epyc or PCIe switch cards needed to run 160 of them in the same machine and still come in below the base clock int8 performance). I suppose you could attempt to run that many off USB somehow on a 16 lane consumer processor but it won't be able to cope with the interrupt management and USB overhead since it's a garbage protocol. That number is wrong since the 4090 will be running quite a bit faster than base clock at 380W draw but it doesn't matter. 0.576W / TOP vs 0.8125W / TOP. Sparsity potentially halves that number. Of course most of the models that can be quantized to INT8 will run in INT4 as well so you could be running everything at ~1.3 INT4 PetaOPS. Asus makes the only PCIe cards that hold 16 of them per card and according to this incredibly fat gay black trans-man you might as well cut your junk off and join scientology before expecting any of their products to work correctly. They're either going to require a 1/1/1/1/1... etc 1x16 way bifurcation nothing supports, or Asus is going to be charging you 500-600 for a 2-generation outdated PCIe switch that they've probably programmed wrong if their redrivers on Threadripper boards are any indication. The fact that they need a 2-slot card to dissipate 54W on these is a good indication. Their new boards with USB 4 state that the cables shouldn't be removed or swapped until all power to the motherboard is killed. You know, USB-C connectors, those famously stable things. Their last threadripper board managed to have slots that stopped working if you used other slots on a platform with 128 lanes available, and lacked enough 8-pin PCI-e power to the motherboard to run more than 2 GPUs... and it sounded like less of a disaster than their old x99 boards. TL;DR if you're going to buy outdated tech junk, instead of feeding money to whatever goofdouche here is promoting, hop on ebay and treat your home network to an upgrade to P2P 56Gb Ethernet / Infiniband dual port cards and some SR4 transceivers so you can quit being insulted by things like 2.5Gb ethernet in brand new machines, or any of the FPGAs that are available that I'm just gonna go ahead and bet will destroy the performance of these things (plus people might actually hire you for "Languages known: VHDL / Verilog" on a resume, they won't for "I SET UP PYTHON AND DETECTED FACES!!!!!!11").
I have heard of at least one graphics card manufacturer installing an NVME drive on the back of a GPU, and now knowing that an NVME AI accelerator exists, I wonder if either of the graphics card companies would include code to directly leverage an installed AI accelerator for their accelerated GPU features.
I appreciate the Raspberry shade being tongue in cheek. But its getting a little overly salespitchy at this point with the Zima Blade ;). Ive dipped my toes outside of the Raspberry Pi ecosystem plenty of times, but its still the king for its ecosystem and support community. All this niche hardware has its place and im glad theres competition.... But once you start caring about more than what the Raspberry Pi can offer, you really should be looking into actual industrial production grade equipment. Its really not as cool as you make it out to be, to be the prettiest girl, with no friends, at the.... Special Ed school dance. Harsh and offensive... I know. But it gets the point across very well about how i see the Zima Blade Trying to be on the cuttingt edge of the hobbiest ARM community is just silly, and just means you have outgrown it. 3 months later and all my Pi 5s are handily eauipied with m.2 and PoE, and its given appropriate time for the 3D print community and 3rd party accessory providers to provide complimentary supportive equipment. Messing with systems like these is all about learning, socializing, communicating and having fun. Raspberry Pi has that on lock. iMHO once you've outgrown the Pi you should move right on over to Jetson and higher end Arduino setups, and start investing in real industrial quality setups with userbases to match. But that's just me. Invest in platforms that have legs and big companies and communities begind them makes spending in this space an actual investment, and not just expensive adult toys. If the Pi 5 supported faster USB and pci-e it would defeat the entire point of the community it targets and affordability it maintains even with the shortages making prices go up on the Pi chips themselves. Having anything faster then Gen3 pci-e in a project like these would be like putting a Ferrari engine in your lawnmower, and even with the "affordability" of the Zima, would put you right back into insane price points. With a small fraction of the comminty and support you would have when stuff inevitably is buggy or poorly documented. And. . any time something in Raspberry Pi land doesn't work.... Its your opportunity to figure it out and get it fixed... And actually have your efforts appreciated by a huge number of people
Asus is selling a card for PCIE 16x with 8 modules for 1380 bucks. Considering that the included modules have a total cost of 200, that is a wildly mad rip-off scheme. Correction: The chips you put on a card even cost 20 instead of 25 which makes the total cost of the chips only 160 bucks. Even greater rip-off by Asus.
Conda (anaconda or miniconda) resolves python dependency issues by creating a separate environment with the specified version of python, and isolates the environment along with anything loaded into the environment related to Python (ie if you install with pip or conda, it stays with the environment and leaves the systems python alone)
This allows me to use alpha versions of ubuntu with any python programs i want by having separated environments for each program
Yeah it's weird he didn't know this
Yeah putting things in a virtual environment works, but pycoral still has some issues with their pycoral.adapters module. I have a raspberry pi 5 and a pi version 3 camera, still having issues when using venv
The Coral TPU m.2 / mini-PCIe version itself is PCIe 2.0 x1 aka 500MB/s, so theoretically it's "slower" than the USB3 version in every case. Obviously the answer isn't as simple as that, but saying the PCIe version is faster because USB3 bandwith is only 625MB/s isn't the right answer either. You have to take into account various overheads, driver optimizations, etc...
Plus it’s not like the coral can process data that quickly anyway. At least not for any workflow I’ve seen, I’d be happy to be proven wrong
the dual edge version needs a pcie x1 switch since I dont think there is a system that can bifurcate an x4 connection to two x1 lanes. so the existence of the dual edge version tells me that it was designed to share a single x1 lane with two x1 tpu devices. meaning 500MB/s is probably enough for two tpu dives in a real world application.
I love that you relate Frigate to golden eye, I do the same thing with relating facilities to bathrooms because of that game.
PS there is nothing wrong with using an older version of python. Use conda or venv to properly version your pip deps
He could have used docker
Person detection. Training the AI to hunt us more efficiently. lol
This is an amazing little thing. I do have an issue with the AI online. Having a local device that can be use to monitor and do what it has to do on an offline local network is amazing.
Did you test to see if the USB bus speed is causing a bottleneck as is the premise of the video? 600MB/s is a huge amount of image data to pass over to the accelerator. Is the bus latency an issue?
Would be nice to see a use case that shows where the “faster” you mention matters.
yeahh I'd say it's only faster if you use the dual TPU model, if supported
The speed of USB on the raspberry pi is limited by the PCIe bus connection, which is one-lane PCIe 2.0, so 5GT/s which translates in roughly 600MB/s _theoretical_ maximum. In practice, it's much lower. The PCIe lane exposed on the raspberry pi 5 is also PCIe 2.0, although you can configure it to PCIe 3.0 (not officially supported) which gives you 10GT/s. It could be faster in the sense that PCIe has lower latency, although I don't really know if that would have much of an effect in the benchmarks. A dual coral TPU would not work, because that uses two PCIe 2.0 lanes.
USB on the Pi 5 should have much more throughput though, since it goes through the south bridge and gets 4 PCIe 2.0 lanes. So I wonder if PCIe vs USB makes that much of a difference, except when it comes to latency maybe. You'd have to test it, It'd really depends on whether you are bandwidth or compute restrained
For starters he's using the m2 A+E coral not the mini PCIe.. not that it matters much I believe they both have pcie 2.0 x1 bandwidth. So basically less than the USB throughput which is a theoretical throughput.. so it could all be a wash and irrelevant
So you state "cheap" but fail to cover the actual costs of all the hardware! 'sup with that? I'd at least expect that in the description, or a pinned comment, or covered in the video and a summary at the end with some chapter markers. Pretty please?
I did the work for you:
USD$119.90 to $199.90 - Zima Board
USD$64 to $96 - Zima Blade, dual-core and quad-core, respectively (unsure of availability)
USD$24.99 - Coral AI Mini PCIe Accelerator
USD$10 - Mini PCIe Adapter Card (eBay)
So Cheap = about USD$125 for the Blade, AI, Adapter, and $25 for power supply, wires, etc (maybe less)
Questionable Availability though
Still cool, but holy hell you really need to be a BOFH SYSOP to get this working. NOT EASY
I found that he did a more "SHOWING OFF" video rather than making a good educational video he glimpse over many subject without even taking time to explain what he doing and that poor montage and jumping from video shot to another is so annoying
Well that's less than half of the cost of groceries for the week so… yeah I'd say it's pretty cheap…
125 dollars for what its doing and relative to what 125 dollars can get you nowadays? Yeah that’s cheap
@@infinitystore2890just another utoober making "content"
If you have the ability to use it on an orange pi 5 which has a m2 slot would be neat, seems like it work with some error debugging nicely.
just a reminder, blur is not destructive. use black/white or colour blocks if you want to censor
Enough blur is destructive enough.
@@KCM25NJL tell that to topaz
Local AI just got easy and cheap.... once you forkout for google proprietary hardware and do some diy hackery.... and it really only has one use case that if you're any bit security conscious you won't want anything google related having access to... fantastic.
Given the keying of your M2 module is should have a max transfer rate of 250MB/s or maybe 500MB/s if using both PCIE channels.
Pretty sure you leaked your public IP address in your router webpage at around 3 mins in. Better blur it away to be safe. This project is novel to me, just that the view screen within MacOS is quite faint when it moves along with the mouse pointer, especially in 9:54.
I'm blown away you were able to get this piece of Google abandonware up
Thank you for this. ;-) Have a zimaboard 832, with coral tpu USB & PCI
Suggestion: use an IP camera to lessen the complexity and length of your video and revise your video, most viewers who want detection will be using IP cameras, and leave the rtsp magic for a separate video. While I used 13 Reolinks cameras, beware that Reolink uses LIVE555 for its rtsp server and earlier cheaper models generate problem packets. I waited almost a year to get my hands on some Coral USBs, it's good to see the M2s available and that certainly expands my options. Thank you.
You seem to know a fair bit about security cameras - might I pick your brain for info? What protocol do they use? Is there opensource software that can make use of any/all security camera makes/models? Can just any security camera be managed this way or does each brand have their own proprietary protocols?
what are some goood ip camera for frigate. please share best for overal use, best for nighttime
@@Kaalkian Frigate's developer is your best bet for answers you seek.
Love the video! The audio is a little wonky at points. Loud and then quiet then REALLY loud then quiet. Hurt my ears with my earbuds in. Thanks for the info!
I don't know how you do your video editing, but I really like the zoom and other effects thrown in here. Nice walkthrough.
*gains sentience* as you plugged in the TPU made me lol
Thanks for the demo and info, have a great day
so freaking cool, if you were to deploy a setup like this would you stick with the tpu person detection, or would you opt more for a general purpose motion detection. that would pick up cars, dogs, people, birds, or whatever, and wouldn't require a designated tpu
You do realize that the Raspberry Pi 5 PCIe is only PCIe 2.0 and x1 lane right? That has a maximum theoretical throughput of 500 MB/s, which is less than the USB version of the Coral TPU... So when using a Pi 5, the USB Coral would absolutely be the better choice.
PCIe 3.0 works just fine on the pi 5.
@@AlmightyEye But the TPU itself is limited to PCIE 2.0. This also implies that both the M.2 and USB versions are capped at 500MB/s.
I'm sure glad it got easy.
You could of course have pulled out your trusty old PC, opened it up and had several PCI slots available. Would it work to have a nmber of these in parallel I wonder...? A mining rig...
Good thing you have in your hand think 2~3 years ago went to buy a older Coral w/ priced $60 nxt thing every direction scooped up & vendors were selling’ like @ $190 even $300 even if one could afford it was they were hard find a/ lead time was spaced like months for the next available batch {same thing w/ the older Versions of Raspberries
Yes, hardware is one thing, but Google's software support is lacking. I had my own struggles with it. NVIDIA CUDA is supported much better. Plus, with Google, you never know when they'll kill it off and leave you high and dry. I can't recommend it.
I got the Mini PCIe version of the Coral TPU
I put it in a Xeon E3 1265 V2 Dell Optiplex 7010 usff
It stomps man
What would be a cool graduation project using some other APIs and LLMs along with Coral? I was thinking face recognition and auto lock based on face but I would like to hear your take on! Thanks for inspiring us to do it our selves! I hope one day I can start a tech community like yours!
Senior design project for an Electrical Engineering major and software engineering minor*
Does not seem like it's enough to me.. all that already exists today, think of quantized NNs and very basic hardware. there is no scientific insight to that unless you compare it to existing systems and evaluate different metrics of the device etc
Or maybe try to absolutely maximize a certain aspect like power efficiency / execution time or one of both or something
LLM's on a TPU is like trying to sew with oven mitts on..You *could* run the mid tier LLAMA llm if you added some extra ram to an SBC or mini PC. Cloud based ML doesn't really speak to the Coral AI's value proposition (edge AI). The coral TPU's are cheap they can process hundreds of frames a second, you could make a state of the art surveillance system for under $100 bucks. The coral runs TF lite and Google has a number of pre-trained models that you could have some fun with.
I personally think that developers should be focusing on making AI have lower requirements. It would be amazing if my $300 phone could run LLM's locally. That could also be amazing for things like servers, where they could exponentially increase their maximum workload.
If we could run multiple AI on a single system, we could absolutely have a functional android very soon.
the annoying thing with google tensorflow is how often stuff is broken with new releases. Over the last 20 years, I've lost count of how many times Google decided to break backward compatibility in their projects. Google's idea of open source kinda blows.
It's all their software really. They just don't care, the have the attention span of a toddler. Microsoft is as evil as Google but at least they make good dev tools.
"This is awesome! I was working with YOLO and weapon detection and was thinking, 'If only this solution were cheaper and more portable, just like your solution.' It would save precious minutes by preemptively invoking a lockdown and alerting emergency services. I believe it's solutions like this that will make a world of difference in the future of technology. We definitely need to be careful about how and when it's used as well, and to create a digital framework for responsible use. However, I believe there are places where its use is warranted, like in schools, for instance.
They actually have AI scanners now. You can see them at many stadiums and convention centers. They’re two posts with a bunch of cameras and sensors and it uses AI to process the scan of your body and detect weapons. Off the top of my head I believe it’s a similar scanner as the scanners in the airport where you raise your hands, except you simply walk through this one
It's 2024 and Linux still makes people use a dos prompt for things like a caveman. Hey look I'm dialing up WOPR with my 2400 baud modem in my terminal program.
So much room for activities
I use Wyze cameras running a hacked bios so they're dumb & stay local, not sending my video & audio to China first before letting me see it. And I have them set up in the backyard to make sure there are no Bears, Mountain Lions, Raccoons, Skunks, etc before I let my dogs out in the middle of the night to pee. But if you had young kids this would be a MUST to check for animals & Disney employees before letting them out in the middle of the night to pee.
It will be interesting to know if it can run stable diffusion and how fast.
What software did you use to screen record while talking at 4:06. That flow is was so smooth
It's called screen studio
nice work, binge-watching your stuff! what's the screen recording software you're using with the rounded-borders face cam bottom right?
The usb one isn't only slower because the interface, it's also underpowered. The one you got isn't the greatest either, you should have got the dual edge which is by far the best price:performance ratio
What brings a person to learn this kind of stuff and where do I sign up
Just being a nerd or a dork. The same thing that brings people to games or sports
We signed up at birth. You KNOW it when you are a geek at heart 😅
Want this on a slot for my Framework 16 instead of the graphics card.
Get an M.2 carrier and instead of a second SSD, use the TPU :)
Would love to see this in the pi 5 since it has pcie to see what it’s capable of
4:13 /dev is devices (and not drivers). There was no apex_0 because the device wasn't plugged in yet.
Raspberry 5 is indeed pretty far behind the curve.
But it also has the biggest community support.
Arm boards will be behind for a while, pretty sure thr Zima board is x86
@@jewlouds Arm CPUs often have 4x PCIe 2.0 or 3.0 available. Raspberry Pi 5 is far behind the curve among ARM boards too.
@wombatillo yea but driver support for most things ARM is severely lacking. And from my understanding RPI, being the device with the largest community, would get specific driver support before anything else. X86 is way more likely to have compatible driver support just on the basis that everything is already running on x86 hardware.
Not in community support. And dont forget the rpi was intended as a singleboard computer for thinkerers/learning. I know its not not seemingly pushed that way but its not meant as a powerfull arm computef
Got it working as a HA add-on, smokes my Amcrest doorbell sensor. "Chihuahua Pitbull mix detected, 109% certainty". Seriously though, wondering when GPU support will work.
i guess you already know, but you should use compose files, otherwise maintenance of your containers will become a hell..
At 4:06, the special file under /dev will only appear when you connect the device. This command does not check for the driver module. The appropriate commamd would be something like modinfo
Please, specify a moment, when I can see something "fast" on that video.
Any chance this thing could be paired with a GPU Mining Rig running eight RTX 3070ti cards?
Hi there, nice video! I really thought these things were awfully more expensive than they actually are.
However, what's your take on using these things for larger models? I mean, it's my understanding these things do NOT have their own memory, so they'd be sequentially exchanging data with the main computer memory for each (or every few) operations, so they are BOUND to be as slow and high latency, having only those PCIe lanes available, aren't they?
Say i'd like to run even just a 7b llama, or something in the low hundrends of mb model, like MarianMT translation models, this would be effectively useless, at least for real time processing wouldn't it?
Also, and this could be a real bonus for this little things, can they be used *simultaneously* for running any 1 model?
I mean could I eventually plug 3 or 4 of these things in one single machine and use them for inference of a given model at 12 or 16 TOPS?
I don’t think LLMs are the best use case for edge computing. Basically LLMs require very little data streaming (literally a sentence to a few paragraphs of text) and the latency of a response is tolerable (theoretically it would only take less than a second to send text to the cloud and recieve the text if the GPTs could run instantly). This discounts privacy concerns, but you could also hook up your own big stationary GPU and send data to your house instead of the cloud the same way.
Anyways, the point is you want to think about edge computing for things where the upload time for the data to be processed would greater than the time for the small gpu to actually do the work. That’s like video, lidar, and audio streams.
Another interesting case is when you kinda blur the lines. You use the local mini gpu to embed the live stream so you can send the smaller embeddings to the cloud for a more complex model that requires a big gpu. Basically edge devices act as data compressors to reduce the necessary upload bandwidth.
Local AI is the future.
I can't take any TH-camr seriously if they don't have green and purple uplights in the background.
God man...the only thing i can think of is A.i exo-skeletion with electrode sensors running up the muscles to help product to wearer's movement 😅
Maybe I missed it, but what card did you plug your TPU into? to be able to plug it into a PCIE slot?
One step closer to Skynet
So how do you use this for AI? Because all I see right now is a very hard to setup RING camera.
You can see the AI bits demonstrated at around 12:30 - it's running inferences on a TensorFlow model to perform object detection. You can detect cars, people, cats, dogs, etc. in custom zones throughout the frame.
He made it a bit more work to install than it has to be (though I'd argue that a 13 minute video doesn't constitute "very hard to set up"). It would be easier using `docker compose` (as opposed to `docker run`, and making it work with a webcam was extra work.
And it's not just a Ring camera, it's a Ring camera that doesn't send your video to the cloud, doesn't harvest your data, works even when your Internet is down, gives you full control, etc. It's very powerful and useful.
@@reidprichard I get what you "would" use it for, my question is "How". I'm not seeing any of that in this video
Wonder how much faster coral sped up the id over just cpu
This guy is like a super gay version of the raspberry pie guy
I'm really confused. Your title is Local AI JUST got Easy. I've been using a Coral TPU with Frigate exactly like this since August 2021 - literally 2 and quarter years before you made this video and I wasn't a cutting edge user by any means. This really isn't new or any different to what we've had for a long time.
Running frigate with 8th gen i5 and Radeon 5450 at 17ms inference. 6 cameras at 640*360 for person detection at 3MP recording. TPU seems out dated now
That the Coral TPU only works with older Python version is a big disadvantage... I struggle with it also, since I am very accustomed to using new functionality of python 3.10-3.12
It’s not ideal but you could always have your Coral code act as a local server running in a separate process and have your main logic run using the newer python. This also gets around the global interpreter lock
By experience, the preferred method is C++ using the libcoral API wraps the TensorFlow Lite C++ API.. Nice performance gains!😁
ZOMG not you using docker for the frigate server and not containing the MQTT in the same compose.... zomg that just made me want to defenestrate myself.
Netvue can do person detection on a 400mhz wifi camera already, not sure you need an ai for that
which screen display, mouse and keyboard are you using for the raspberry pi in this video?
I cannot find an adapter so I can plug this PCIe Mini Card into a PCIe slot
How do you know for sure this will be completely local and untapped tho
nice video man ignore the haters
too rapid jumping around to follow and limited ability to follow commands used. seems very rushed
How usable or sensible is this hardware/software to run an LLM (e.g. LLAMA with voice assistant?
I have the idea to use it for Assist in HomeAssistant (so I don't need the openAI API, which can get expensive if used often)
I would like to pass the analysis of video streams from some ip cameras to a python program of mine that exploits some public models and others written by me to determine who is in the house, (face recognition), where exactly they are (in the 2d space of the floor plan) and what they are doing (hand and gesture recognition).
With 5-6 cameras I should be able to dispense with dozens of motion sensors, occupancy sensors, brightness sensors, doors/windows sensors and at the same time have a better understanding of what is going on in the house and command home automation accordingly.
What is the best hardware infrastructure to implement something like this?
There's something there on your lips. You might wanna take care of that
On a side note, followed from its release. You can tell how popular a device is or a motorcycle, whatever its community size, specific 3rd party devices, etc
Might as well be a bottle of Zima
My goal is to make a llm one day but i dont have a good enough graphics card to handle llm of higher sizes.
Can you give me an idea on what you would doo for strictly an LLM but one that can handle a 70b llm
Ideas for small but powerfull sets or soemthing even specific for that would be cool. Was ready to buy a zima blade and board and coral thing but i want the best without buying an a100
The content is interesting but I couldn't get past the sound issues way too. Echoey
I ended up going with Libre 3 flavors the top one fully blows
away Raspberry 4 when pricing gets even close to $100 for every item it’s like auto eject as hobo happy hobby researching…
Didn't we already have it with Intel neural stick 2
does intel also sell neural compute stick for the same purpose?
can you use that tpu on pc to train deep learning much faster?
What’s the portable screen? Where to get one?
Will that coral tpu speed-up my work on steam deck my roboflow image detection is getting 6fps?
I'm curious if it's possible to use the pcie version via USB with a simple adapter. I understand that it might affect speeds but could it at least hypothetically function? I'm trying to research this ATM. Also the USB version is about $30 more than the USB version so... ya know. Pinching pennies where I can and all that,
I'm curious whether the tensor flow models are updatable. Ie, could an enterprising person add a model which detects an approaching person (vs departing)... same with a vehicle.
using ffmpeg just like that is .... dumb. You need serious reencoding performance *AND* you need to crank the settings as high as possible - so that the image quality is decent so that the inference can make actual f*cking guesses and doesnt miss all the time because you're sending a pixelated 320x240-mess. *OR* : _(hear me out)_ you could use a proper webcam or surveillance cam which pumps the stream out itself.
Why did you use 1 unit when you can use at least 5 pci-e units?
Suddenly, someone needed to sell a bunch of slow 3 year old inference chips with very limited use. This is their story.
An RTX4090 is more energy efficient than an equivalent number of these things assuming no sparsity @ 660 INT8 TOPS for ~380W (ignoring the dual Epyc or PCIe switch cards needed to run 160 of them in the same machine and still come in below the base clock int8 performance). I suppose you could attempt to run that many off USB somehow on a 16 lane consumer processor but it won't be able to cope with the interrupt management and USB overhead since it's a garbage protocol. That number is wrong since the 4090 will be running quite a bit faster than base clock at 380W draw but it doesn't matter. 0.576W / TOP vs 0.8125W / TOP. Sparsity potentially halves that number. Of course most of the models that can be quantized to INT8 will run in INT4 as well so you could be running everything at ~1.3 INT4 PetaOPS.
Asus makes the only PCIe cards that hold 16 of them per card and according to this incredibly fat gay black trans-man you might as well cut your junk off and join scientology before expecting any of their products to work correctly. They're either going to require a 1/1/1/1/1... etc 1x16 way bifurcation nothing supports, or Asus is going to be charging you 500-600 for a 2-generation outdated PCIe switch that they've probably programmed wrong if their redrivers on Threadripper boards are any indication. The fact that they need a 2-slot card to dissipate 54W on these is a good indication. Their new boards with USB 4 state that the cables shouldn't be removed or swapped until all power to the motherboard is killed. You know, USB-C connectors, those famously stable things. Their last threadripper board managed to have slots that stopped working if you used other slots on a platform with 128 lanes available, and lacked enough 8-pin PCI-e power to the motherboard to run more than 2 GPUs... and it sounded like less of a disaster than their old x99 boards.
TL;DR if you're going to buy outdated tech junk, instead of feeding money to whatever goofdouche here is promoting, hop on ebay and treat your home network to an upgrade to P2P 56Gb Ethernet / Infiniband dual port cards and some SR4 transceivers so you can quit being insulted by things like 2.5Gb ethernet in brand new machines, or any of the FPGAs that are available that I'm just gonna go ahead and bet will destroy the performance of these things (plus people might actually hire you for "Languages known: VHDL / Verilog" on a resume, they won't for "I SET UP PYTHON AND DETECTED FACES!!!!!!11").
I have heard of at least one graphics card manufacturer installing an NVME drive on the back of a GPU, and now knowing that an NVME AI accelerator exists, I wonder if either of the graphics card companies would include code to directly leverage an installed AI accelerator for their accelerated GPU features.
many GPUs have AI accelerating built in, why would you install a slower secondary accelerator when you already have a GPU that could run it anyway
@@Physuo because it's busy processing a different task?
recording only when there is a person with security cameras? metal gear solid told me how to deal with it hehe (card box)
What is the tiny monitor that we see early in the video clip?
usb 3 is supposed to be 5 gb/s not 650mb im pretty sure
I was checking on this but this is already deprecated
Phenomenal
Would this work on the orange pi 5 plus
Use some of that AI to fix your audio
i love debian.
So....slower than without it?
I appreciate the Raspberry shade being tongue in cheek. But its getting a little overly salespitchy at this point with the Zima Blade ;). Ive dipped my toes outside of the Raspberry Pi ecosystem plenty of times, but its still the king for its ecosystem and support community. All this niche hardware has its place and im glad theres competition.... But once you start caring about more than what the Raspberry Pi can offer, you really should be looking into actual industrial production grade equipment. Its really not as cool as you make it out to be, to be the prettiest girl, with no friends, at the.... Special Ed school dance. Harsh and offensive... I know. But it gets the point across very well about how i see the Zima Blade
Trying to be on the cuttingt edge of the hobbiest ARM community is just silly, and just means you have outgrown it. 3 months later and all my Pi 5s are handily eauipied with m.2 and PoE, and its given appropriate time for the 3D print community and 3rd party accessory providers to provide complimentary supportive equipment.
Messing with systems like these is all about learning, socializing, communicating and having fun. Raspberry Pi has that on lock.
iMHO once you've outgrown the Pi you should move right on over to Jetson and higher end Arduino setups, and start investing in real industrial quality setups with userbases to match. But that's just me. Invest in platforms that have legs and big companies and communities begind them makes spending in this space an actual investment, and not just expensive adult toys.
If the Pi 5 supported faster USB and pci-e it would defeat the entire point of the community it targets and affordability it maintains even with the shortages making prices go up on the Pi chips themselves. Having anything faster then Gen3 pci-e in a project like these would be like putting a Ferrari engine in your lawnmower, and even with the "affordability" of the Zima, would put you right back into insane price points. With a small fraction of the comminty and support you would have when stuff inevitably is buggy or poorly documented.
And. . any time something in Raspberry Pi land doesn't work.... Its your opportunity to figure it out and get it fixed... And actually have your efforts appreciated by a huge number of people
that zooming in and out of the screen capture is actually making me barf
Hi, are you for hire, as I need help with building a prototype?
Pretty sure you do not need to add root to any groups ;)
And running sudo as root, priceless :D
Sir can u plz tell me how can i inspect Netflix api traffic
why secure boot on linux computer?
Asus is selling a card for PCIE 16x with 8 modules for 1380 bucks. Considering that the included modules have a total cost of 200, that is a wildly mad rip-off scheme.
Correction: The chips you put on a card even cost 20 instead of 25 which makes the total cost of the chips only 160 bucks. Even greater rip-off by Asus.
I was out when you said built by google
I want to put a coral on a clockwork uconsole