So you state "cheap" but fail to cover the actual costs of all the hardware! 'sup with that? I'd at least expect that in the description, or a pinned comment, or covered in the video and a summary at the end with some chapter markers. Pretty please? I did the work for you: USD$119.90 to $199.90 - Zima Board USD$64 to $96 - Zima Blade, dual-core and quad-core, respectively (unsure of availability) USD$24.99 - Coral AI Mini PCIe Accelerator USD$10 - Mini PCIe Adapter Card (eBay) So Cheap = about USD$125 for the Blade, AI, Adapter, and $25 for power supply, wires, etc (maybe less) Questionable Availability though Still cool, but holy hell you really need to be a BOFH SYSOP to get this working. NOT EASY
I found that he did a more "SHOWING OFF" video rather than making a good educational video he glimpse over many subject without even taking time to explain what he doing and that poor montage and jumping from video shot to another is so annoying
The Coral TPU m.2 / mini-PCIe version itself is PCIe 2.0 x1 aka 500MB/s, so theoretically it's "slower" than the USB3 version in every case. Obviously the answer isn't as simple as that, but saying the PCIe version is faster because USB3 bandwith is only 625MB/s isn't the right answer either. You have to take into account various overheads, driver optimizations, etc...
the dual edge version needs a pcie x1 switch since I dont think there is a system that can bifurcate an x4 connection to two x1 lanes. so the existence of the dual edge version tells me that it was designed to share a single x1 lane with two x1 tpu devices. meaning 500MB/s is probably enough for two tpu dives in a real world application.
Did you test to see if the USB bus speed is causing a bottleneck as is the premise of the video? 600MB/s is a huge amount of image data to pass over to the accelerator. Is the bus latency an issue? Would be nice to see a use case that shows where the “faster” you mention matters.
The speed of USB on the raspberry pi is limited by the PCIe bus connection, which is one-lane PCIe 2.0, so 5GT/s which translates in roughly 600MB/s _theoretical_ maximum. In practice, it's much lower. The PCIe lane exposed on the raspberry pi 5 is also PCIe 2.0, although you can configure it to PCIe 3.0 (not officially supported) which gives you 10GT/s. It could be faster in the sense that PCIe has lower latency, although I don't really know if that would have much of an effect in the benchmarks. A dual coral TPU would not work, because that uses two PCIe 2.0 lanes. USB on the Pi 5 should have much more throughput though, since it goes through the south bridge and gets 4 PCIe 2.0 lanes. So I wonder if PCIe vs USB makes that much of a difference, except when it comes to latency maybe. You'd have to test it, It'd really depends on whether you are bandwidth or compute restrained
For starters he's using the m2 A+E coral not the mini PCIe.. not that it matters much I believe they both have pcie 2.0 x1 bandwidth. So basically less than the USB throughput which is a theoretical throughput.. so it could all be a wash and irrelevant
Conda (anaconda or miniconda) resolves python dependency issues by creating a separate environment with the specified version of python, and isolates the environment along with anything loaded into the environment related to Python (ie if you install with pip or conda, it stays with the environment and leaves the systems python alone) This allows me to use alpha versions of ubuntu with any python programs i want by having separated environments for each program
Yeah putting things in a virtual environment works, but pycoral still has some issues with their pycoral.adapters module. I have a raspberry pi 5 and a pi version 3 camera, still having issues when using venv
Person detection. Training the AI to hunt us more efficiently. lol This is an amazing little thing. I do have an issue with the AI online. Having a local device that can be use to monitor and do what it has to do on an offline local network is amazing.
Pretty sure you leaked your public IP address in your router webpage at around 3 mins in. Better blur it away to be safe. This project is novel to me, just that the view screen within MacOS is quite faint when it moves along with the mouse pointer, especially in 9:54.
so freaking cool, if you were to deploy a setup like this would you stick with the tpu person detection, or would you opt more for a general purpose motion detection. that would pick up cars, dogs, people, birds, or whatever, and wouldn't require a designated tpu
I'm really confused. Your title is Local AI JUST got Easy. I've been using a Coral TPU with Frigate exactly like this since August 2021 - literally 2 and quarter years before you made this video and I wasn't a cutting edge user by any means. This really isn't new or any different to what we've had for a long time.
What would be a cool graduation project using some other APIs and LLMs along with Coral? I was thinking face recognition and auto lock based on face but I would like to hear your take on! Thanks for inspiring us to do it our selves! I hope one day I can start a tech community like yours!
Does not seem like it's enough to me.. all that already exists today, think of quantized NNs and very basic hardware. there is no scientific insight to that unless you compare it to existing systems and evaluate different metrics of the device etc
LLM's on a TPU is like trying to sew with oven mitts on..You *could* run the mid tier LLAMA llm if you added some extra ram to an SBC or mini PC. Cloud based ML doesn't really speak to the Coral AI's value proposition (edge AI). The coral TPU's are cheap they can process hundreds of frames a second, you could make a state of the art surveillance system for under $100 bucks. The coral runs TF lite and Google has a number of pre-trained models that you could have some fun with.
I personally think that developers should be focusing on making AI have lower requirements. It would be amazing if my $300 phone could run LLM's locally. That could also be amazing for things like servers, where they could exponentially increase their maximum workload.
At 4:06, the special file under /dev will only appear when you connect the device. This command does not check for the driver module. The appropriate commamd would be something like modinfo
You could of course have pulled out your trusty old PC, opened it up and had several PCI slots available. Would it work to have a nmber of these in parallel I wonder...? A mining rig...
Good thing you have in your hand think 2~3 years ago went to buy a older Coral w/ priced $60 nxt thing every direction scooped up & vendors were selling’ like @ $190 even $300 even if one could afford it was they were hard find a/ lead time was spaced like months for the next available batch {same thing w/ the older Versions of Raspberries
Suggestion: use an IP camera to lessen the complexity and length of your video and revise your video, most viewers who want detection will be using IP cameras, and leave the rtsp magic for a separate video. While I used 13 Reolinks cameras, beware that Reolink uses LIVE555 for its rtsp server and earlier cheaper models generate problem packets. I waited almost a year to get my hands on some Coral USBs, it's good to see the M2s available and that certainly expands my options. Thank you.
You seem to know a fair bit about security cameras - might I pick your brain for info? What protocol do they use? Is there opensource software that can make use of any/all security camera makes/models? Can just any security camera be managed this way or does each brand have their own proprietary protocols?
Love the video! The audio is a little wonky at points. Loud and then quiet then REALLY loud then quiet. Hurt my ears with my earbuds in. Thanks for the info!
Hi there, nice video! I really thought these things were awfully more expensive than they actually are. However, what's your take on using these things for larger models? I mean, it's my understanding these things do NOT have their own memory, so they'd be sequentially exchanging data with the main computer memory for each (or every few) operations, so they are BOUND to be as slow and high latency, having only those PCIe lanes available, aren't they? Say i'd like to run even just a 7b llama, or something in the low hundrends of mb model, like MarianMT translation models, this would be effectively useless, at least for real time processing wouldn't it? Also, and this could be a real bonus for this little things, can they be used *simultaneously* for running any 1 model? I mean could I eventually plug 3 or 4 of these things in one single machine and use them for inference of a given model at 12 or 16 TOPS?
I don’t think LLMs are the best use case for edge computing. Basically LLMs require very little data streaming (literally a sentence to a few paragraphs of text) and the latency of a response is tolerable (theoretically it would only take less than a second to send text to the cloud and recieve the text if the GPTs could run instantly). This discounts privacy concerns, but you could also hook up your own big stationary GPU and send data to your house instead of the cloud the same way. Anyways, the point is you want to think about edge computing for things where the upload time for the data to be processed would greater than the time for the small gpu to actually do the work. That’s like video, lidar, and audio streams. Another interesting case is when you kinda blur the lines. You use the local mini gpu to embed the live stream so you can send the smaller embeddings to the cloud for a more complex model that requires a big gpu. Basically edge devices act as data compressors to reduce the necessary upload bandwidth.
That the Coral TPU only works with older Python version is a big disadvantage... I struggle with it also, since I am very accustomed to using new functionality of python 3.10-3.12
It’s not ideal but you could always have your Coral code act as a local server running in a separate process and have your main logic run using the newer python. This also gets around the global interpreter lock
You do realize that the Raspberry Pi 5 PCIe is only PCIe 2.0 and x1 lane right? That has a maximum theoretical throughput of 500 MB/s, which is less than the USB version of the Coral TPU... So when using a Pi 5, the USB Coral would absolutely be the better choice.
Got it working as a HA add-on, smokes my Amcrest doorbell sensor. "Chihuahua Pitbull mix detected, 109% certainty". Seriously though, wondering when GPU support will work.
It's 2024 and Linux still makes people use a dos prompt for things like a caveman. Hey look I'm dialing up WOPR with my 2400 baud modem in my terminal program.
The usb one isn't only slower because the interface, it's also underpowered. The one you got isn't the greatest either, you should have got the dual edge which is by far the best price:performance ratio
You can see the AI bits demonstrated at around 12:30 - it's running inferences on a TensorFlow model to perform object detection. You can detect cars, people, cats, dogs, etc. in custom zones throughout the frame. He made it a bit more work to install than it has to be (though I'd argue that a 13 minute video doesn't constitute "very hard to set up"). It would be easier using `docker compose` (as opposed to `docker run`, and making it work with a webcam was extra work. And it's not just a Ring camera, it's a Ring camera that doesn't send your video to the cloud, doesn't harvest your data, works even when your Internet is down, gives you full control, etc. It's very powerful and useful.
How usable or sensible is this hardware/software to run an LLM (e.g. LLAMA with voice assistant? I have the idea to use it for Assist in HomeAssistant (so I don't need the openAI API, which can get expensive if used often)
I'm curious whether the tensor flow models are updatable. Ie, could an enterprising person add a model which detects an approaching person (vs departing)... same with a vehicle.
ZOMG not you using docker for the frigate server and not containing the MQTT in the same compose.... zomg that just made me want to defenestrate myself.
I would like to pass the analysis of video streams from some ip cameras to a python program of mine that exploits some public models and others written by me to determine who is in the house, (face recognition), where exactly they are (in the 2d space of the floor plan) and what they are doing (hand and gesture recognition). With 5-6 cameras I should be able to dispense with dozens of motion sensors, occupancy sensors, brightness sensors, doors/windows sensors and at the same time have a better understanding of what is going on in the house and command home automation accordingly. What is the best hardware infrastructure to implement something like this?
the annoying thing with google tensorflow is how often stuff is broken with new releases. Over the last 20 years, I've lost count of how many times Google decided to break backward compatibility in their projects. Google's idea of open source kinda blows.
It's all their software really. They just don't care, the have the attention span of a toddler. Microsoft is as evil as Google but at least they make good dev tools.
Got the M.2 version, but I run HassIO on a Virtualbox VM. I wonder how can I pass it through? I have the module detected by Windows, after connecting it to a PCIE adapter, but I also have (somewhere) an m.2 to SATA adapter. Could it do the trick? Or maybe put the sata to USB case and plug it to a port?
I have heard of at least one graphics card manufacturer installing an NVME drive on the back of a GPU, and now knowing that an NVME AI accelerator exists, I wonder if either of the graphics card companies would include code to directly leverage an installed AI accelerator for their accelerated GPU features.
I'm curious if it's possible to use the pcie version via USB with a simple adapter. I understand that it might affect speeds but could it at least hypothetically function? I'm trying to research this ATM. Also the USB version is about $30 more than the USB version so... ya know. Pinching pennies where I can and all that,
My goal is to make a llm one day but i dont have a good enough graphics card to handle llm of higher sizes. Can you give me an idea on what you would doo for strictly an LLM but one that can handle a 70b llm Ideas for small but powerfull sets or soemthing even specific for that would be cool. Was ready to buy a zima blade and board and coral thing but i want the best without buying an a100
Yes, hardware is one thing, but Google's software support is lacking. I had my own struggles with it. NVIDIA CUDA is supported much better. Plus, with Google, you never know when they'll kill it off and leave you high and dry. I can't recommend it.
I ended up going with Libre 3 flavors the top one fully blows away Raspberry 4 when pricing gets even close to $100 for every item it’s like auto eject as hobo happy hobby researching…
I use Wyze cameras running a hacked bios so they're dumb & stay local, not sending my video & audio to China first before letting me see it. And I have them set up in the backyard to make sure there are no Bears, Mountain Lions, Raccoons, Skunks, etc before I let my dogs out in the middle of the night to pee. But if you had young kids this would be a MUST to check for animals & Disney employees before letting them out in the middle of the night to pee.
using ffmpeg just like that is .... dumb. You need serious reencoding performance *AND* you need to crank the settings as high as possible - so that the image quality is decent so that the inference can make actual f*cking guesses and doesnt miss all the time because you're sending a pixelated 320x240-mess. *OR* : _(hear me out)_ you could use a proper webcam or surveillance cam which pumps the stream out itself.
@wombatillo yea but driver support for most things ARM is severely lacking. And from my understanding RPI, being the device with the largest community, would get specific driver support before anything else. X86 is way more likely to have compatible driver support just on the basis that everything is already running on x86 hardware.
Not in community support. And dont forget the rpi was intended as a singleboard computer for thinkerers/learning. I know its not not seemingly pushed that way but its not meant as a powerfull arm computef
Asus has the CRL-G116U-P3DF, which houses 16x of these boards, for a total of 64 TOPS. However, these CoralTPU TensorFlowLite Accelerator Boards also came out in 2019, so hardly the best option at present. For half the cost, you could buy a RTX 4080 12GB and get 305 TOPS.
@@KalaniMakutu See this is something that I've been trying to figure out for the longest while. Which products for AI compute density and which ones for performance per watt/weight.
It has to be a correctly keyed M.2 slot. I am running one with CodeProjectAI and Blue Iris. I get sub 100ms person detection times. I had to buy an m.2 PCI card because I have the dual TPU and it's not the keyed the same as a NVME drive. Mine only has an E key.
Suddenly, someone needed to sell a bunch of slow 3 year old inference chips with very limited use. This is their story. An RTX4090 is more energy efficient than an equivalent number of these things assuming no sparsity @ 660 INT8 TOPS for ~380W (ignoring the dual Epyc or PCIe switch cards needed to run 160 of them in the same machine and still come in below the base clock int8 performance). I suppose you could attempt to run that many off USB somehow on a 16 lane consumer processor but it won't be able to cope with the interrupt management and USB overhead since it's a garbage protocol. That number is wrong since the 4090 will be running quite a bit faster than base clock at 380W draw but it doesn't matter. 0.576W / TOP vs 0.8125W / TOP. Sparsity potentially halves that number. Of course most of the models that can be quantized to INT8 will run in INT4 as well so you could be running everything at ~1.3 INT4 PetaOPS. Asus makes the only PCIe cards that hold 16 of them per card and according to this incredibly fat gay black trans-man you might as well cut your junk off and join scientology before expecting any of their products to work correctly. They're either going to require a 1/1/1/1/1... etc 1x16 way bifurcation nothing supports, or Asus is going to be charging you 500-600 for a 2-generation outdated PCIe switch that they've probably programmed wrong if their redrivers on Threadripper boards are any indication. The fact that they need a 2-slot card to dissipate 54W on these is a good indication. Their new boards with USB 4 state that the cables shouldn't be removed or swapped until all power to the motherboard is killed. You know, USB-C connectors, those famously stable things. Their last threadripper board managed to have slots that stopped working if you used other slots on a platform with 128 lanes available, and lacked enough 8-pin PCI-e power to the motherboard to run more than 2 GPUs... and it sounded like less of a disaster than their old x99 boards. TL;DR if you're going to buy outdated tech junk, instead of feeding money to whatever goofdouche here is promoting, hop on ebay and treat your home network to an upgrade to P2P 56Gb Ethernet / Infiniband dual port cards and some SR4 transceivers so you can quit being insulted by things like 2.5Gb ethernet in brand new machines, or any of the FPGAs that are available that I'm just gonna go ahead and bet will destroy the performance of these things (plus people might actually hire you for "Languages known: VHDL / Verilog" on a resume, they won't for "I SET UP PYTHON AND DETECTED FACES!!!!!!11").
Can you copy and paste those config files and docker commands into the video description? Super fast cuts of you copying and pasting things in a video is not helpful.
There's something there on your lips. You might wanna take care of that On a side note, followed from its release. You can tell how popular a device is or a motorcycle, whatever its community size, specific 3rd party devices, etc Might as well be a bottle of Zima
So you state "cheap" but fail to cover the actual costs of all the hardware! 'sup with that? I'd at least expect that in the description, or a pinned comment, or covered in the video and a summary at the end with some chapter markers. Pretty please?
I did the work for you:
USD$119.90 to $199.90 - Zima Board
USD$64 to $96 - Zima Blade, dual-core and quad-core, respectively (unsure of availability)
USD$24.99 - Coral AI Mini PCIe Accelerator
USD$10 - Mini PCIe Adapter Card (eBay)
So Cheap = about USD$125 for the Blade, AI, Adapter, and $25 for power supply, wires, etc (maybe less)
Questionable Availability though
Still cool, but holy hell you really need to be a BOFH SYSOP to get this working. NOT EASY
I found that he did a more "SHOWING OFF" video rather than making a good educational video he glimpse over many subject without even taking time to explain what he doing and that poor montage and jumping from video shot to another is so annoying
Well that's less than half of the cost of groceries for the week so… yeah I'd say it's pretty cheap…
125 dollars for what its doing and relative to what 125 dollars can get you nowadays? Yeah that’s cheap
@@infinitystore2890just another utoober making "content"
If you have the ability to use it on an orange pi 5 which has a m2 slot would be neat, seems like it work with some error debugging nicely.
I love that you relate Frigate to golden eye, I do the same thing with relating facilities to bathrooms because of that game.
The Coral TPU m.2 / mini-PCIe version itself is PCIe 2.0 x1 aka 500MB/s, so theoretically it's "slower" than the USB3 version in every case. Obviously the answer isn't as simple as that, but saying the PCIe version is faster because USB3 bandwith is only 625MB/s isn't the right answer either. You have to take into account various overheads, driver optimizations, etc...
Plus it’s not like the coral can process data that quickly anyway. At least not for any workflow I’ve seen, I’d be happy to be proven wrong
the dual edge version needs a pcie x1 switch since I dont think there is a system that can bifurcate an x4 connection to two x1 lanes. so the existence of the dual edge version tells me that it was designed to share a single x1 lane with two x1 tpu devices. meaning 500MB/s is probably enough for two tpu dives in a real world application.
PS there is nothing wrong with using an older version of python. Use conda or venv to properly version your pip deps
He could have used docker
Did you test to see if the USB bus speed is causing a bottleneck as is the premise of the video? 600MB/s is a huge amount of image data to pass over to the accelerator. Is the bus latency an issue?
Would be nice to see a use case that shows where the “faster” you mention matters.
yeahh I'd say it's only faster if you use the dual TPU model, if supported
The speed of USB on the raspberry pi is limited by the PCIe bus connection, which is one-lane PCIe 2.0, so 5GT/s which translates in roughly 600MB/s _theoretical_ maximum. In practice, it's much lower. The PCIe lane exposed on the raspberry pi 5 is also PCIe 2.0, although you can configure it to PCIe 3.0 (not officially supported) which gives you 10GT/s. It could be faster in the sense that PCIe has lower latency, although I don't really know if that would have much of an effect in the benchmarks. A dual coral TPU would not work, because that uses two PCIe 2.0 lanes.
USB on the Pi 5 should have much more throughput though, since it goes through the south bridge and gets 4 PCIe 2.0 lanes. So I wonder if PCIe vs USB makes that much of a difference, except when it comes to latency maybe. You'd have to test it, It'd really depends on whether you are bandwidth or compute restrained
For starters he's using the m2 A+E coral not the mini PCIe.. not that it matters much I believe they both have pcie 2.0 x1 bandwidth. So basically less than the USB throughput which is a theoretical throughput.. so it could all be a wash and irrelevant
Conda (anaconda or miniconda) resolves python dependency issues by creating a separate environment with the specified version of python, and isolates the environment along with anything loaded into the environment related to Python (ie if you install with pip or conda, it stays with the environment and leaves the systems python alone)
This allows me to use alpha versions of ubuntu with any python programs i want by having separated environments for each program
Yeah it's weird he didn't know this
Yeah putting things in a virtual environment works, but pycoral still has some issues with their pycoral.adapters module. I have a raspberry pi 5 and a pi version 3 camera, still having issues when using venv
Person detection. Training the AI to hunt us more efficiently. lol
This is an amazing little thing. I do have an issue with the AI online. Having a local device that can be use to monitor and do what it has to do on an offline local network is amazing.
Pretty sure you leaked your public IP address in your router webpage at around 3 mins in. Better blur it away to be safe. This project is novel to me, just that the view screen within MacOS is quite faint when it moves along with the mouse pointer, especially in 9:54.
Given the keying of your M2 module is should have a max transfer rate of 250MB/s or maybe 500MB/s if using both PCIE channels.
I'm blown away you were able to get this piece of Google abandonware up
Thanks for the demo and info, have a great day
I don't know how you do your video editing, but I really like the zoom and other effects thrown in here. Nice walkthrough.
so freaking cool, if you were to deploy a setup like this would you stick with the tpu person detection, or would you opt more for a general purpose motion detection. that would pick up cars, dogs, people, birds, or whatever, and wouldn't require a designated tpu
Thank you for this. ;-) Have a zimaboard 832, with coral tpu USB & PCI
I got the Mini PCIe version of the Coral TPU
I put it in a Xeon E3 1265 V2 Dell Optiplex 7010 usff
It stomps man
4:13 /dev is devices (and not drivers). There was no apex_0 because the device wasn't plugged in yet.
just a reminder, blur is not destructive. use black/white or colour blocks if you want to censor
Enough blur is destructive enough.
@@KCM25NJL tell that to topaz
*gains sentience* as you plugged in the TPU made me lol
I'm really confused. Your title is Local AI JUST got Easy. I've been using a Coral TPU with Frigate exactly like this since August 2021 - literally 2 and quarter years before you made this video and I wasn't a cutting edge user by any means. This really isn't new or any different to what we've had for a long time.
What would be a cool graduation project using some other APIs and LLMs along with Coral? I was thinking face recognition and auto lock based on face but I would like to hear your take on! Thanks for inspiring us to do it our selves! I hope one day I can start a tech community like yours!
Senior design project for an Electrical Engineering major and software engineering minor*
Does not seem like it's enough to me.. all that already exists today, think of quantized NNs and very basic hardware. there is no scientific insight to that unless you compare it to existing systems and evaluate different metrics of the device etc
Or maybe try to absolutely maximize a certain aspect like power efficiency / execution time or one of both or something
LLM's on a TPU is like trying to sew with oven mitts on..You *could* run the mid tier LLAMA llm if you added some extra ram to an SBC or mini PC. Cloud based ML doesn't really speak to the Coral AI's value proposition (edge AI). The coral TPU's are cheap they can process hundreds of frames a second, you could make a state of the art surveillance system for under $100 bucks. The coral runs TF lite and Google has a number of pre-trained models that you could have some fun with.
I personally think that developers should be focusing on making AI have lower requirements. It would be amazing if my $300 phone could run LLM's locally. That could also be amazing for things like servers, where they could exponentially increase their maximum workload.
At 4:06, the special file under /dev will only appear when you connect the device. This command does not check for the driver module. The appropriate commamd would be something like modinfo
You could of course have pulled out your trusty old PC, opened it up and had several PCI slots available. Would it work to have a nmber of these in parallel I wonder...? A mining rig...
Good thing you have in your hand think 2~3 years ago went to buy a older Coral w/ priced $60 nxt thing every direction scooped up & vendors were selling’ like @ $190 even $300 even if one could afford it was they were hard find a/ lead time was spaced like months for the next available batch {same thing w/ the older Versions of Raspberries
What brings a person to learn this kind of stuff and where do I sign up
Just being a nerd or a dork. The same thing that brings people to games or sports
We signed up at birth. You KNOW it when you are a geek at heart 😅
Want this on a slot for my Framework 16 instead of the graphics card.
Get an M.2 carrier and instead of a second SSD, use the TPU :)
Would love to see this in the pi 5 since it has pcie to see what it’s capable of
Suggestion: use an IP camera to lessen the complexity and length of your video and revise your video, most viewers who want detection will be using IP cameras, and leave the rtsp magic for a separate video. While I used 13 Reolinks cameras, beware that Reolink uses LIVE555 for its rtsp server and earlier cheaper models generate problem packets. I waited almost a year to get my hands on some Coral USBs, it's good to see the M2s available and that certainly expands my options. Thank you.
You seem to know a fair bit about security cameras - might I pick your brain for info? What protocol do they use? Is there opensource software that can make use of any/all security camera makes/models? Can just any security camera be managed this way or does each brand have their own proprietary protocols?
what are some goood ip camera for frigate. please share best for overal use, best for nighttime
@@Kaalkian Frigate's developer is your best bet for answers you seek.
Love the video! The audio is a little wonky at points. Loud and then quiet then REALLY loud then quiet. Hurt my ears with my earbuds in. Thanks for the info!
It will be interesting to know if it can run stable diffusion and how fast.
What software did you use to screen record while talking at 4:06. That flow is was so smooth
It's called screen studio
Maybe I missed it, but what card did you plug your TPU into? to be able to plug it into a PCIE slot?
Hi there, nice video! I really thought these things were awfully more expensive than they actually are.
However, what's your take on using these things for larger models? I mean, it's my understanding these things do NOT have their own memory, so they'd be sequentially exchanging data with the main computer memory for each (or every few) operations, so they are BOUND to be as slow and high latency, having only those PCIe lanes available, aren't they?
Say i'd like to run even just a 7b llama, or something in the low hundrends of mb model, like MarianMT translation models, this would be effectively useless, at least for real time processing wouldn't it?
Also, and this could be a real bonus for this little things, can they be used *simultaneously* for running any 1 model?
I mean could I eventually plug 3 or 4 of these things in one single machine and use them for inference of a given model at 12 or 16 TOPS?
I don’t think LLMs are the best use case for edge computing. Basically LLMs require very little data streaming (literally a sentence to a few paragraphs of text) and the latency of a response is tolerable (theoretically it would only take less than a second to send text to the cloud and recieve the text if the GPTs could run instantly). This discounts privacy concerns, but you could also hook up your own big stationary GPU and send data to your house instead of the cloud the same way.
Anyways, the point is you want to think about edge computing for things where the upload time for the data to be processed would greater than the time for the small gpu to actually do the work. That’s like video, lidar, and audio streams.
Another interesting case is when you kinda blur the lines. You use the local mini gpu to embed the live stream so you can send the smaller embeddings to the cloud for a more complex model that requires a big gpu. Basically edge devices act as data compressors to reduce the necessary upload bandwidth.
That the Coral TPU only works with older Python version is a big disadvantage... I struggle with it also, since I am very accustomed to using new functionality of python 3.10-3.12
It’s not ideal but you could always have your Coral code act as a local server running in a separate process and have your main logic run using the newer python. This also gets around the global interpreter lock
By experience, the preferred method is C++ using the libcoral API wraps the TensorFlow Lite C++ API.. Nice performance gains!😁
You do realize that the Raspberry Pi 5 PCIe is only PCIe 2.0 and x1 lane right? That has a maximum theoretical throughput of 500 MB/s, which is less than the USB version of the Coral TPU... So when using a Pi 5, the USB Coral would absolutely be the better choice.
PCIe 3.0 works just fine on the pi 5.
@@AlmightyEye But the TPU itself is limited to PCIE 2.0. This also implies that both the M.2 and USB versions are capped at 500MB/s.
Got it working as a HA add-on, smokes my Amcrest doorbell sensor. "Chihuahua Pitbull mix detected, 109% certainty". Seriously though, wondering when GPU support will work.
Didn't we already have it with Intel neural stick 2
Any chance this thing could be paired with a GPU Mining Rig running eight RTX 3070ti cards?
does intel also sell neural compute stick for the same purpose?
It's 2024 and Linux still makes people use a dos prompt for things like a caveman. Hey look I'm dialing up WOPR with my 2400 baud modem in my terminal program.
Netvue can do person detection on a 400mhz wifi camera already, not sure you need an ai for that
How do you know for sure this will be completely local and untapped tho
Running frigate with 8th gen i5 and Radeon 5450 at 17ms inference. 6 cameras at 640*360 for person detection at 3MP recording. TPU seems out dated now
The usb one isn't only slower because the interface, it's also underpowered. The one you got isn't the greatest either, you should have got the dual edge which is by far the best price:performance ratio
i guess you already know, but you should use compose files, otherwise maintenance of your containers will become a hell..
So how do you use this for AI? Because all I see right now is a very hard to setup RING camera.
You can see the AI bits demonstrated at around 12:30 - it's running inferences on a TensorFlow model to perform object detection. You can detect cars, people, cats, dogs, etc. in custom zones throughout the frame.
He made it a bit more work to install than it has to be (though I'd argue that a 13 minute video doesn't constitute "very hard to set up"). It would be easier using `docker compose` (as opposed to `docker run`, and making it work with a webcam was extra work.
And it's not just a Ring camera, it's a Ring camera that doesn't send your video to the cloud, doesn't harvest your data, works even when your Internet is down, gives you full control, etc. It's very powerful and useful.
@@reidprichard I get what you "would" use it for, my question is "How". I'm not seeing any of that in this video
which screen display, mouse and keyboard are you using for the raspberry pi in this video?
I cannot find an adapter so I can plug this PCIe Mini Card into a PCIe slot
nice work, binge-watching your stuff! what's the screen recording software you're using with the rounded-borders face cam bottom right?
What is the tiny monitor that we see early in the video clip?
I'm sure glad it got easy.
Please, specify a moment, when I can see something "fast" on that video.
What’s the portable screen? Where to get one?
Wonder how much faster coral sped up the id over just cpu
How usable or sensible is this hardware/software to run an LLM (e.g. LLAMA with voice assistant?
I have the idea to use it for Assist in HomeAssistant (so I don't need the openAI API, which can get expensive if used often)
I'm curious whether the tensor flow models are updatable. Ie, could an enterprising person add a model which detects an approaching person (vs departing)... same with a vehicle.
Why did you use 1 unit when you can use at least 5 pci-e units?
ZOMG not you using docker for the frigate server and not containing the MQTT in the same compose.... zomg that just made me want to defenestrate myself.
can you use that tpu on pc to train deep learning much faster?
I would like to pass the analysis of video streams from some ip cameras to a python program of mine that exploits some public models and others written by me to determine who is in the house, (face recognition), where exactly they are (in the 2d space of the floor plan) and what they are doing (hand and gesture recognition).
With 5-6 cameras I should be able to dispense with dozens of motion sensors, occupancy sensors, brightness sensors, doors/windows sensors and at the same time have a better understanding of what is going on in the house and command home automation accordingly.
What is the best hardware infrastructure to implement something like this?
the annoying thing with google tensorflow is how often stuff is broken with new releases. Over the last 20 years, I've lost count of how many times Google decided to break backward compatibility in their projects. Google's idea of open source kinda blows.
It's all their software really. They just don't care, the have the attention span of a toddler. Microsoft is as evil as Google but at least they make good dev tools.
Got the M.2 version, but I run HassIO on a Virtualbox VM. I wonder how can I pass it through?
I have the module detected by Windows, after connecting it to a PCIE adapter, but I also have (somewhere) an m.2 to SATA adapter. Could it do the trick?
Or maybe put the sata to USB case and plug it to a port?
I have heard of at least one graphics card manufacturer installing an NVME drive on the back of a GPU, and now knowing that an NVME AI accelerator exists, I wonder if either of the graphics card companies would include code to directly leverage an installed AI accelerator for their accelerated GPU features.
many GPUs have AI accelerating built in, why would you install a slower secondary accelerator when you already have a GPU that could run it anyway
@@Physuo because it's busy processing a different task?
Pretty sure you do not need to add root to any groups ;)
And running sudo as root, priceless :D
So much room for activities
I was checking on this but this is already deprecated
usb 3 is supposed to be 5 gb/s not 650mb im pretty sure
I'm curious if it's possible to use the pcie version via USB with a simple adapter. I understand that it might affect speeds but could it at least hypothetically function? I'm trying to research this ATM. Also the USB version is about $30 more than the USB version so... ya know. Pinching pennies where I can and all that,
My goal is to make a llm one day but i dont have a good enough graphics card to handle llm of higher sizes.
Can you give me an idea on what you would doo for strictly an LLM but one that can handle a 70b llm
Ideas for small but powerfull sets or soemthing even specific for that would be cool. Was ready to buy a zima blade and board and coral thing but i want the best without buying an a100
Yes, hardware is one thing, but Google's software support is lacking. I had my own struggles with it. NVIDIA CUDA is supported much better. Plus, with Google, you never know when they'll kill it off and leave you high and dry. I can't recommend it.
I ended up going with Libre 3 flavors the top one fully blows
away Raspberry 4 when pricing gets even close to $100 for every item it’s like auto eject as hobo happy hobby researching…
So....slower than without it?
Will that coral tpu speed-up my work on steam deck my roboflow image detection is getting 6fps?
Would this work on the orange pi 5 plus
God man...the only thing i can think of is A.i exo-skeletion with electrode sensors running up the muscles to help product to wearer's movement 😅
I use Wyze cameras running a hacked bios so they're dumb & stay local, not sending my video & audio to China first before letting me see it. And I have them set up in the backyard to make sure there are no Bears, Mountain Lions, Raccoons, Skunks, etc before I let my dogs out in the middle of the night to pee. But if you had young kids this would be a MUST to check for animals & Disney employees before letting them out in the middle of the night to pee.
using ffmpeg just like that is .... dumb. You need serious reencoding performance *AND* you need to crank the settings as high as possible - so that the image quality is decent so that the inference can make actual f*cking guesses and doesnt miss all the time because you're sending a pixelated 320x240-mess. *OR* : _(hear me out)_ you could use a proper webcam or surveillance cam which pumps the stream out itself.
The content is interesting but I couldn't get past the sound issues way too. Echoey
I want to put a coral on a clockwork uconsole
Is there a discord or something to join
Hi, are you for hire, as I need help with building a prototype?
is the voice AI generated? its kinda choppy
Use some of that AI to fix your audio
Raspberry 5 is indeed pretty far behind the curve.
But it also has the biggest community support.
Arm boards will be behind for a while, pretty sure thr Zima board is x86
@@jewlouds Arm CPUs often have 4x PCIe 2.0 or 3.0 available. Raspberry Pi 5 is far behind the curve among ARM boards too.
@wombatillo yea but driver support for most things ARM is severely lacking. And from my understanding RPI, being the device with the largest community, would get specific driver support before anything else. X86 is way more likely to have compatible driver support just on the basis that everything is already running on x86 hardware.
Not in community support. And dont forget the rpi was intended as a singleboard computer for thinkerers/learning. I know its not not seemingly pushed that way but its not meant as a powerfull arm computef
Wondering if the s capable of ruining small llms such as phi 3 4k instruct withabdffeeebt sbc of course maybe the 16gb ram. What s u thank?
Sorry for bad English I barley know engkish
that one sucks, it wont work with LLM's
Is it possible or even practical to have a dozen coral AI pcie cards in the same system?
Asus has the CRL-G116U-P3DF, which houses 16x of these boards, for a total of 64 TOPS. However, these CoralTPU TensorFlowLite Accelerator Boards also came out in 2019, so hardly the best option at present. For half the cost, you could buy a RTX 4080 12GB and get 305 TOPS.
@@KalaniMakutu See this is something that I've been trying to figure out for the longest while. Which products for AI compute density and which ones for performance per watt/weight.
One step closer to Skynet
recording only when there is a person with security cameras? metal gear solid told me how to deal with it hehe (card box)
Sir can u plz tell me how can i inspect Netflix api traffic
So I could run this on any miniserver with an m2 slot right? Sounds like the way easyer option to me.
It has to be a correctly keyed M.2 slot. I am running one with CodeProjectAI and Blue Iris. I get sub 100ms person detection times. I had to buy an m.2 PCI card because I have the dual TPU and it's not the keyed the same as a NVME drive. Mine only has an E key.
Can it run llama 7b?
too rapid jumping around to follow and limited ability to follow commands used. seems very rushed
What is warp? He used it for remote tasks.
AI enhanced Terminal
Suddenly, someone needed to sell a bunch of slow 3 year old inference chips with very limited use. This is their story.
An RTX4090 is more energy efficient than an equivalent number of these things assuming no sparsity @ 660 INT8 TOPS for ~380W (ignoring the dual Epyc or PCIe switch cards needed to run 160 of them in the same machine and still come in below the base clock int8 performance). I suppose you could attempt to run that many off USB somehow on a 16 lane consumer processor but it won't be able to cope with the interrupt management and USB overhead since it's a garbage protocol. That number is wrong since the 4090 will be running quite a bit faster than base clock at 380W draw but it doesn't matter. 0.576W / TOP vs 0.8125W / TOP. Sparsity potentially halves that number. Of course most of the models that can be quantized to INT8 will run in INT4 as well so you could be running everything at ~1.3 INT4 PetaOPS.
Asus makes the only PCIe cards that hold 16 of them per card and according to this incredibly fat gay black trans-man you might as well cut your junk off and join scientology before expecting any of their products to work correctly. They're either going to require a 1/1/1/1/1... etc 1x16 way bifurcation nothing supports, or Asus is going to be charging you 500-600 for a 2-generation outdated PCIe switch that they've probably programmed wrong if their redrivers on Threadripper boards are any indication. The fact that they need a 2-slot card to dissipate 54W on these is a good indication. Their new boards with USB 4 state that the cables shouldn't be removed or swapped until all power to the motherboard is killed. You know, USB-C connectors, those famously stable things. Their last threadripper board managed to have slots that stopped working if you used other slots on a platform with 128 lanes available, and lacked enough 8-pin PCI-e power to the motherboard to run more than 2 GPUs... and it sounded like less of a disaster than their old x99 boards.
TL;DR if you're going to buy outdated tech junk, instead of feeding money to whatever goofdouche here is promoting, hop on ebay and treat your home network to an upgrade to P2P 56Gb Ethernet / Infiniband dual port cards and some SR4 transceivers so you can quit being insulted by things like 2.5Gb ethernet in brand new machines, or any of the FPGAs that are available that I'm just gonna go ahead and bet will destroy the performance of these things (plus people might actually hire you for "Languages known: VHDL / Verilog" on a resume, they won't for "I SET UP PYTHON AND DETECTED FACES!!!!!!11").
why secure boot on linux computer?
I checked and they are all out of stock...When and what store did you buy the TPU from?
amazon has them in stock.
Local AI is the future.
Can you copy and paste those config files and docker commands into the video description? Super fast cuts of you copying and pasting things in a video is not helpful.
medium.com/@timothydmoody/coral-edge-tpu-home-lab-frigate-budget-surveillance-system-7e26acb834e4
how much will all of this cost me
There's something there on your lips. You might wanna take care of that
On a side note, followed from its release. You can tell how popular a device is or a motorcycle, whatever its community size, specific 3rd party devices, etc
Might as well be a bottle of Zima