- 38
- 115 326
RoboTF AI
United States
เข้าร่วมเมื่อ 20 มี.ค. 2024
Just another engineer fooling around in a GPU powered Kubernetes based AI/LLM Lab. We play around with different GPUs, different Large Language Models, run tests and see what we can learn together.
I am not an expert, and make bad decisions, please do not follow anything I do on this channel.
I am not an expert, and make bad decisions, please do not follow anything I do on this channel.
Building Ubuntu Server for AI LLMs from scratch Part 3: ComfyUI, Open WebUI, Flowise, n8n, and more!
Building Ubuntu Server for AI LLMs from scratch Part 3: ComfyUI, Open WebUI, Flowise, n8n, and more!
Even with a cold and terrible voice - this week we are working on Part 3 of the series and getting a huge amount of Local AI based tools to use and play with on top of our LocalAI server. We get ComfyUI, Open WebUI, Flowise, n8n, Postgres, Chroma, and Unstructured API all running locally in our lab! We also quickly cover integrating VS Code and our AI Lab using Continue.dev (www.continue.dev/) with our LocalAI stack. Everything done locally, go build your workflows, agents, and more!
ComfyUI - Image/Video Generation - github.com/comfyanonymous/ComfyUI
Open WebUI - For local served chat application - github.com/open-webui/open-webui
Flowise - No code drag and drop AI/LLM application builder -github.com/FlowiseAI/Flowise
n8n - AI/LLM Automation Build low code - github.com/n8n-io/n8n
Postgres - for databases for Flowise and n8n
Chroma - AI/LLM Database for embeddings, vector storage - docs.trychroma.com/docs/overview/introduction
Unstructured API - For pre processing many file types - github.com/Unstructured-IO/unstructured-api
Github Repo for RoboTF AI Suite of compose files and instructions:
github.com/kkacsh321/robotf-ai-suite
Just a fun day on the workbench, grab your favorite relaxation method and join in.
Our website: robotf.ai
Machine specs here: robotf.ai/Machine_Lab_Specs
(These are affiliate-based links that help the channel if you purchase from them!)
Machine Components:
Open Air Case amzn.to/4a7V9pi
30cm Gen 4 PCIe Extender amzn.to/3Unhclh
20cm Gen 4 PCIe Extender amzn.to/4eEiosA
2 TB NVME amzn.to/4gWFcFb
EVGA SuperNova 1600 G+ Power Supply amzn.to/3XWorBB
240GB Crucial SSD amzn.to/406aIJA
G.SKILL Ripjaws V Series DDR 64GB Kit amzn.to/4dAZrWm
Core I9 9820x amzn.to/47UuIST
Thermalright AK90 CPU Cooler: amzn.to/4iYRSwf
Noctua Thermal Paste: amzn.to/4fMenSq
Supermicro CX299-PGF Logic Board amzn.to/3BxbWVr
Remote Power Switch amzn.to/3BubQOg
Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!
Even with a cold and terrible voice - this week we are working on Part 3 of the series and getting a huge amount of Local AI based tools to use and play with on top of our LocalAI server. We get ComfyUI, Open WebUI, Flowise, n8n, Postgres, Chroma, and Unstructured API all running locally in our lab! We also quickly cover integrating VS Code and our AI Lab using Continue.dev (www.continue.dev/) with our LocalAI stack. Everything done locally, go build your workflows, agents, and more!
ComfyUI - Image/Video Generation - github.com/comfyanonymous/ComfyUI
Open WebUI - For local served chat application - github.com/open-webui/open-webui
Flowise - No code drag and drop AI/LLM application builder -github.com/FlowiseAI/Flowise
n8n - AI/LLM Automation Build low code - github.com/n8n-io/n8n
Postgres - for databases for Flowise and n8n
Chroma - AI/LLM Database for embeddings, vector storage - docs.trychroma.com/docs/overview/introduction
Unstructured API - For pre processing many file types - github.com/Unstructured-IO/unstructured-api
Github Repo for RoboTF AI Suite of compose files and instructions:
github.com/kkacsh321/robotf-ai-suite
Just a fun day on the workbench, grab your favorite relaxation method and join in.
Our website: robotf.ai
Machine specs here: robotf.ai/Machine_Lab_Specs
(These are affiliate-based links that help the channel if you purchase from them!)
Machine Components:
Open Air Case amzn.to/4a7V9pi
30cm Gen 4 PCIe Extender amzn.to/3Unhclh
20cm Gen 4 PCIe Extender amzn.to/4eEiosA
2 TB NVME amzn.to/4gWFcFb
EVGA SuperNova 1600 G+ Power Supply amzn.to/3XWorBB
240GB Crucial SSD amzn.to/406aIJA
G.SKILL Ripjaws V Series DDR 64GB Kit amzn.to/4dAZrWm
Core I9 9820x amzn.to/47UuIST
Thermalright AK90 CPU Cooler: amzn.to/4iYRSwf
Noctua Thermal Paste: amzn.to/4fMenSq
Supermicro CX299-PGF Logic Board amzn.to/3BxbWVr
Remote Power Switch amzn.to/3BubQOg
Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!
มุมมอง: 546
วีดีโอ
Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia Cuda Drivers, Toolkit, LocalAI!
มุมมอง 86614 วันที่ผ่านมา
Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia Cuda Drivers, Toolkit, LocalAI! This week we are working on Part 2 of the series and getting the node bootstrapped with all the necessary software needed to run our first target of LocalAI (localai.io/) with Docker and Docker Compose. Github Repo for RoboTF AI Suite of compose files and instructions: github.com/kkacsh321/robotf-...
Building Ubuntu Server for AI and LLMs from scratch Part 1: Hardware build, Open Air Case, Multi-GPU
มุมมอง 2.1K21 วันที่ผ่านมา
Building Ubuntu Server for AI and LLMs from scratch Part 1: Hardware build, Open Air Case, Multi-GPU This week we are starting on building another AI Server from scratch using what we got in laying around in the lab, and some new parts. We will walk through putting the case, and hardware together while blabbing and maybe answering a few viewer questions along the way. This will be the first in ...
Cutting up a Dell R720xd to add 2x Nvidia Tesla M40 24GB GPUs - Llama 3.3 70B, QwQ 32B, and more!
มุมมอง 68321 วันที่ผ่านมา
Cutting up a Dell R720xd to add 2x Nvidia Tesla M40 24GB GPUs - Llama 3.3 70B, QwQ 32B, and more! This week we are literally cutting up my Dell R720xd TrueNAS Scale server to add 2 Nvidia Tesla M40 24GB GPU's (old cards from my original setup) for a backup inference server for when my other nodes might be under maintenance. We will take it through Llama 3.3 70B IQ4, Qwen QwQ 32B Preview, Mistra...
3 Uncensored AI comedians do the 12 Days of Christmas and roast AI Engineers with homelabs! 🎄
มุมมอง 11728 วันที่ผ่านมา
3 Uncensored AI comedians do the 12 Days of Christmas and roast AI Engineers with homelabs! 🎄 This week on the RoboTF Development Desk: We whip together the ability to have three different AI Robot comedian personalities running a comedy show together. Random conversation, Text To Speech, and so on. Tonight's topic is "AI Engineers with GPU Homelabs and the 12 days of Christmas" Happy Holidays!...
LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference
มุมมอง 2.2Kหลายเดือนก่อน
LocalAI LLM Testing: Llama 3.3 70B Instruct Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference This week we are taking Llama 3.3 70B (huggingface.co/bartowski/Llama-3.3-70B-Instruct-GGUF) at a Q8 quant running 96k of context through some tests but focusing on showing the PCIe bandwidth during inference in a multi gpu setup. Hopefully providing more insight for hardware requirements, an...
3 AI Comedians talk about Running a GPU Homelab for AI and Machine Learning using uncensored models!
มุมมอง 252หลายเดือนก่อน
3 AI Comedians talk about Running a GPU Homelab for AI and Machine Learning using uncensored models! 🦾 🤣 This week on the RoboTF Development Desk: We whip together the ability to have three different AI Robot comedian personalities running a comedy show together. Random conversation, Text To Speech, and so on. Tonight's topic is "Running expensive GPU's in a homelab for AI and Machine Learning"...
Project: Count your tokens for Huggingface models! 🪙 AutoTikTokenizer and RoboTF LLM Token Estimator
มุมมอง 189หลายเดือนก่อน
Project: Count your tokens for Huggingface models! 🪙 AutoTikTokenizer and RoboTF LLM Token Estimator This week on the RoboTF Development Desk: Tired lab day after thanksgiving.. We use Streamlit, Python, and AutoTikTokenizer to create a quick Token Estimator for HuggingFace open source models! I want to encourage everyone to go build something, anything, learn, and have some fun along the way. ...
Mistral 7B LLM AI Leaderboard: Gigabyte AMD Radeon RX 7600 XT 16GB uses ROCm to hit leaderboard!
มุมมอง 1.5Kหลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Gigabyte AMD Radeon RX 7600 XT 16GB uses ROCm to hit leaderboard! This week in the RoboTF lab: We take the Gigabyte AMD Radeon RX 7600XT through the Mistral 7B Leaderboard tests using ROCm. Where do you think it will land? Final Results around 15 min mark RoboTF Website: robotf.ai/Mistral_7B_Leaderboard Spend a ☕️ worth of time in the RoboTF lab, and let's put som...
LLM Testing Behind the Scenes AMD Radeon 7600 XT ROCm set flash_attention to true! 🤦♂️
มุมมอง 790หลายเดือนก่อน
LLM Testing Behind the Scenes AMD Radeon 7600 XT ROCm set flash_attention to true! 🤦♂️ This week in the RoboTF lab: Quick follow up on the interesting performance profile with ROCm and the quants versions. TLDR - turn flash_attention to true and watch the performance difference and make me do a face palm for not trying the earlier. This is the previous video on the flash_attention setting with...
LLM Testing Behind the Scenes Unboxing Gigabyte AMD Radeon 7600 XT 16GB ROCm vs Vulkan battle!
มุมมอง 9372 หลายเดือนก่อน
LLM Testing Behind the Scenes Unboxing Gigabyte AMD Radeon 7600 XT 16GB ROCm vs Vulkan battle! This week in the RoboTF lab: We do some behind the scenes testing for pre-leaderboard runs with a brand new AMD Radeon 7600 XT 16GB and bring everyone along for the ride. We need to figure out which backend to run this card with for the leaderboard tests. Turns into ROCm vs Vulkan backend battle - whi...
Mistral 7B LLM AI Leaderboard: Unboxing AsRock Intel ARC A770 16GB and running it through our tests!
มุมมอง 1.9K2 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Unboxing AsRock Intel ARC A770 16GB and running it through our tests! This week in the RoboTF lab: We unbox a brand new AsRock Intel ARC A770 16GB and I fight it for weeks! Mash of linux, intel drivers, intel oneapi kit, etc, etc issues....but it works with Sycl backend and llama.cpp - let's run it through the leaderboard tests. Where do you think it will land amo...
Mistral 7B LLM AI Leaderboard: Apple M1 Max MacBook Pro uses Metal to take on GPU's for their spot!
มุมมอง 4K2 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Apple M1 Max MacBook Pro uses Metal to take on GPU's for their spot! This week in the RoboTF lab: Quick mid week update where we take my personal MacBook Pro M1 Max with 32GB from 2021 through the Mistral Leaderboard tests. Where do you think it will land amongst the GPU's and other CPU's we have tested? Final results at 13 Min Mark. Leaderboard is live: robotf.ai...
Mistral 7B LLM AI Leaderboard: i5-12450H 32GB DDR4 Mini PC takes on the leaderboard?
มุมมอง 9472 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: i5-12450H 32GB DDR4 Mini PC takes on the leaderboard? This week in the RoboTF lab: Whole lab is completely torn apart, everything in house disaster, and fixing a really dumb power button issue with main 6xA4500 3090 Lab machine... Then we jump into running a Mini PC with a mobile based i5 through the leaderboard. Where will it land? This was long but fun, and agai...
Halloween Stories via Streamlit, Langchain, Python, and LocalAI (or OpenAI) with Text to Speech!
มุมมอง 2393 หลายเดือนก่อน
RoboTF Halloween Stories via Streamlit, Langchain, Python, and LocalAI ( OpenAI) with Text to Speech This week in the RoboTF lab: I want to encourage everyone to go build something, anything, learn, and have some fun along the way. We introduce the RoboTF Halloween Stories application, walk through it a bit, listen to a few stories, play with different setups. Even do a quick demo of LocalAI ru...
Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU 20GB Where does prosumer/enterprise land?
มุมมอง 7553 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU 20GB Where does prosumer/enterprise land?
LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2
มุมมอง 4673 หลายเดือนก่อน
LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS?
มุมมอง 3.3K3 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS?
Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down!
มุมมอง 7393 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down!
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4070Ti Super 16GB and giving it run!
มุมมอง 1.5K4 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4070Ti Super 16GB and giving it run!
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB
มุมมอง 9064 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB
มุมมอง 8134 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660
มุมมอง 5654 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660
Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000
มุมมอง 4234 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X
มุมมอง 4064 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X
มุมมอง 3734 หลายเดือนก่อน
Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X
LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!
มุมมอง 2K5 หลายเดือนก่อน
LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!
LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes
มุมมอง 8K5 หลายเดือนก่อน
LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes
LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB
มุมมอง 16K5 หลายเดือนก่อน
LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB
LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4
มุมมอง 22K6 หลายเดือนก่อน
LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4
1st 5:30 seconds of talking... I bet it was going to start any time but I just ran out of patients.. GR8T Job. You RoCk!
Thanks!
Wow thanks for that, much appreciated!
Hi, I am a new to all of this. I follow the video, but got an error when I run the docker common when all the docker images finished downloading. My Ubuntu system has one 512 nvme drive. Error response from daemon: error while creating mount source path '/var/localai/models': mkdir /var/localai: read-only file system
Do a `git pull origin main` to pull the latest code down (that path is very specific to my setup) - or you will need to change the `volumes` setup in the localai-compose.yaml to mimic this instead github.com/kkacsh321/robotf-ai-suite/blob/main/LocalAI/localai-compose.yaml#L24C1-L26C26 which is just `- ../models:/models`.
Thank you for the quick reply, Yes, I did pull your latest git repo and set volume path /var/localai/models and have mkdir 2 directory /var/localai/ and /var/localai/models, follow instructions to create newgrp docker and set permisson
Absolutely excellent work! Just what I needed. Just in time. :)
Great to hear!
Did bro just cover up the sound of his toilet flushing by going "wow did yall hear thise jets flying over the house?"😂😂
If only I could have the full lab setup in the bathroom! 🚽
Can you have mix VRAM GPUs to run Llama? Ex 24gb and 16gb RTX 40 series?
Yep, can absolutely mix Nvidia cards together. We have another video where we address just this question: th-cam.com/video/guP0QW_9410/w-d-xo.htmlsi=lHSbK6WSS8lvdoFV if you want more in depth answer.
Thanks for sharing all this invaluable information. Nicely explained from the start and understandably. 🎉
My pleasure 😊
I was tinkering with llama2.cpp last year, and had a lot of fun with it in the home lab. Been away for a while, to say I'm more than impressed with both localai and this video series is an understatement. Excellent, excellent work. Whoever the hell you are, a big thank you!
Hey, I love the channel. I started making my own setup a few months ago and came across this when researching docker. New to programing environment and ubuntu. I come from the industrial automation side PLC's and networking. I am going to do a fresh install following your guide on a Node I am setting up for practice.
Much appreciated! Hope you find it all a good starting place to get up and going quickly, getting started is always the hardest part.
@RoboTFAI exactly. I got my primary system set up after a month of configuring. I learned a lot about Linux in the process through troubleshooting, and I think 5 install and build from the ground up. This will be perfect for the secondary setup I want to connect to. I will get you a coffee for the time and energy, lol
If i interpret your Test right, then you have enough memory with 3 cards ( 48gb ), but Not enough GPU Prozess Power.I have a 3090 and 2x 4060 ti 16gb. Will Test If i got more then 5t/s....
The speed is more to do with the limited 128-bit memory interface and 272 GB/s of bandwidth of the 4060. Your 3090 for example is three times faster with its 384-bit memory interface and 935.8 GB/s of bandwidth. If you swapped those 2x 4060's, for another 3090, you should achieve ~15+ t/s. M4 Max with its 546GB/s of bandwidth runs at 8-9 t/s New Strix Halo with its memory bandwidth of 238 GB/s should run at around ~4 t/s My AM5 Board with Ram set at 6000Mhz has memory bandwidth of 90 GB/s runs 70BQ4 at 1.5 t/s
I have a r730xd with 1 m40 installed. I’m looking to install my second m40 but it looks like it physically doesn’t fit into the second pci slot. Do I have to cut something’s ? Hahah
I don't have an R730, but assume it's very similar to the 720 - best I can could say is maybe? It should support two full length cards according to the docs....I would be concerned about the cooling of course. I have my other PCIe slots filled, so was 0 chance for me, and not afraid of hacking things up!
Have you tested llama 70b on the cpu only?
Wow, I thougth that would be much much more traffic. So mi Thunderbolt 3 eGPU is perfectly fine as a complement to an internal card.
most people do, at least for just inference - which is why I decided I would show folks the data from "enterprise" level cards.
This video is really good. Very informative. I would like to know how different pcie size sockets affect tensor splitting, as the cards send their results to each other.
As in say 16x vs 8x etc? I think we could do that - if interested in just seeing the PCIe traffic in a multi-gpu setup you may enjoy this video th-cam.com/video/ki_Rm_p7kao/w-d-xo.html where we show PCIe traffic during loading and inference with enterprise grade cards that emit those metrics.
@@RoboTFAI Hey, thank you very much. I'll see it tonight.
would be interesting to see some 13/20/70b models too
hi 2 rtx 4060 ti 16gb = to 32gb vram +-??
Yes, at least for running inference!
Thank you so much for sharing this; it’s been really useful. Could you please consider creating a written guide as well?
Sure thing! Will add the base commands for bootstrapping and getting starting etc to the repo as soon as I get time. Thanks for suggestion!
I am certainly learning stuff from your video's. Perfect pace. Thank you for your efforts.
Much appreciated and great to hear!
This is amazing. Thanks for all the effort
My pleasure! Came from all the viewers asking for these guides. Appreciate you guys watching
Great video, the exact content that was necessary. I propose K8s setup for one of the next videos... I played locally with Ollama, vLLM and Triton and Triton with vLLM backend. But LocalAI was the next thing I wanted to check out...and just in time ;)
Much appreciated! Great idea, I think we will go through a few more things - then repeat for K8s (since that's what I prefer to have my nodes in).
Hi, one question, if you mix cards, will they all run at the speed of the slower one? or each one would max? Or how much do the cards talk to each other, or they just split the work and go?
We have another older video answering that question directly if you want to watch th-cam.com/video/guP0QW_9410/w-d-xo.html
@@RoboTFAI Thanks, excellent video. It would be great to test a model that does not fit in a single card, and see how different mix of cards perform by fullfilling the vram.
Excellent content as usual: Ash would be proud!
🤣 Thanks for watching, hoping this series will help people get started on their own!
this is amazing! can't wait to see how the distributed inferencing works!
We have a few other videos on the channel that touch on distributed inference! But we will cover that also in this series for people who want to do it.
how are you using Mac OS with NVIDIA GPUs?
We are not, we running the cards on Ubuntu based nodes (mostly in Kubernetes) remotely on the channel. We do test the Macs in a few other videos with Metal though!
AI? Apple Intelligence? ;)
I am retired/disabled IT person myself. I have a unraid server I like a lot. I have been playing with AI stuff but find my Nvidia 3060 12gb vram has it's limits. I no longer have a budget to buy much anymore. My unraid server is so packed I cannot add anymore hardware to it and it has some key dockers running that I don't like playing on it to much. Plus my disability makes it hard to work on PCs in those big cases. So I have a old pc laying around so I ordered a test bench station, some PCIe extenders cables and some used Nvidia K40's. I plan to do what you are doing here so I will be following your channel. I'm just unsure if or how I can link all the NVidia K40's 24 gb Vram cards to get 48 gb vram. I plan to run Comfyui. maybe run dockers on it as well. Look forward to more of your videos.
Not a boring video at all! I plan to build my own AI server in the very near future! Thanks for the walk through.
Glad you enjoyed it!
The intent of using multiple GPUs is to run larger models. So this test is not very useful and the results might be not surprising. So what about 33b- or 70b-models with this setup?
I started building mine about 3 weeks ago. But just with an rx 7900 xtx 24gb. First time trying ubuntu with LLM. Very soon I was stuck with network card driver. Apparently some drivers are not available in Linux 😂😂 I am crying while looking for solutions.
If youre hardware is not supported in linux mostly it is crappy hardware
When you run multi GPU I am assuming no SLI setup is required. Is that right?
That is correct - no SLI required!
What is LocalAI ? Is it like CGBT ?
localai.io/
An interesting part of this test that you didn't cover in your charts is the power draw (which was visible during the testing), beyond mentioning heating your house with M40s. The cost of power, plus the card should be considered for "bang for the buck". In this case, it appears that with the popularity gain pushing up prices for the Tesla cards and the a4500 ($1500 ish today on Ebay for used cards), the 4060 ti represents by far the best bang for the buck in a consumer machine of what you tested.
Thanks for watching we try to make as much data available as possible. You may want to check out our Mistral Leaderboard (robotf.ai/Mistral_7B_Leaderboard) and associated videos/tests where we do tend to track average power usage on the cards for the community.
Thanks for doing the stable diffusion tests. I had written off cards this old because I thought they wouldn't do video. I'll have to re-think my server build now.
They will do it! Not with speed but will do it. Still not bad for 10 year old cards that can be had fairly cheap.
Have we discovered what AI can't do?
It still hasn't been able to make me a pizza 🍕 - it however can order one for me so I guess there is that! 🤣
I'm sad that there are no $249 Jetson Orin Nano Super's available. I have a solution that needs 5 different small models running simultaneously and the 8GB cards would be have been great for that. I have two AI inference servers: One 7900 XTX and One 4060 Ti 16GB. Loved the hardware modification video, I definitely wouldn't call it carnage until you break out the giant tin snips and break a few PCBs and rewire a few traces :)
Interesting thoughts Shawn, I hadn't even considered using the Jetson's really - just seemed out of sorts but would def be low power consumption. I love me some mini pc's so might have to re-look at him. I also have needs to run several smaller models in parallel and why I ended up shoving these old cards in the server so I can use it as a backup/secondary inference server for some of my automation/apps. It might also be additional node in distributed if/when needed. I did use tin snips along with the jigsaw! 🤣 I tried to avoid any sparks or damage to the actual electronics and only had to pull out the soldering station for a few wires for the fans. 🔥
Have you tested P40s? Were you using llama.cpp directly? Can you dive into context length and context caching, tensor parallelism etc with llama.cpp on these older cards ❤
I don't have Pascal based cards (well worth testing, just a P2000) in the lab to really play with. I run LocalAI (localai.io/) which in these tests for inference uses llama.cpp under the hood. LocalAI is a server/wrapper for many backend engines. I have touched on tensor splitting (not necessarily parallelism) in another video - what would you like to see more on?
1st
Warning - if you are sensitive to hardware carnage don't watch first parts 🤣Cases are SO confining.... Happy holidays and what are you doing with your old junk hardware sitting around?
dunno, going from 10 tokens/sec to 30 tokens/sec, yes its 3x but still feels slow Excellent content, thank you
Thanks for watching! We try to give data to folks - you might be more of 3090/4090 💰 fan and can checkout our leaderboard series and website (robotf.ai/Mistral_7B_Leaderboard)
I am going back into your videos to see if you did one, but would LOVE to see what a p40 will do.
Apologies, haven't had my hands on many Pascal based cards in the lab - minus the P2000 I use for video encoding. I mostly skipped over that generation in my hardware upgrades in the lab. But we did take the P2000 through a bit of the Leaderboard tests on the channel in a different video. 🤷
My Apple federal contact was telling me about TB -> fiber channel interfaces. Wondering about a cluster of Mac M4 minis with fiber channel.
In theory... slower than a single one, however would have capabilities to run VERY large models or many smaller models at once. With FC based storage would also help with initial loading time of models (fast storage -> fast memory/vram = quicker loading times)
I'm very curious how my brand new 64g M4 Pro mini will do compared to the M1 max. Do you know if any use of the inference cores is on the horizon or valuable? Maybe only for training? I'm pretty new to this. In fact so far I have only installed Ollama, which I assume is way slower, and not Metal?
LocalAI (build from source. which is what is shown in this video), Llama.cpp, Ollama all support Metal in one way or another. If just using for quick inference and maybe a basic api server endpoint I would highly suggest just running LM Studio on the Mac easiest and great interface. Can turn on the server and use as an endpoint for coding assistants/etc/etc.
@ Thanks my friend. That sounds like good advice. The company I work for just starting building an AI in AWS. I'm just a lowly 25 year veteran sys admin working with NASA as a contractor.
Great video. Seems 4060Ti is still too slow for serious use.
Great video, thanks. I would suggest making the particular screens you're focusing on bigger and more readable.
Thanks for the tip! I stopped recording in 4k and tried to make things more visible for folks in the newer videos, this one kinda old now so my apologies.
is that a must to put same gpus there to run llm on multi gpu can i go with 3090 and 4090 together?
You can absolutely mix different types of Nvidia cards together. We have a video that directly answers that question (and do it all the time in our lab). th-cam.com/video/guP0QW_9410/w-d-xo.html
@@RoboTFAIcan you mix nvidia and amd?
Matrix cores: Am I a joke to you?
So... What's your favorite pizza? And what's your AI's favorite pizza?
I like pepperoni, mushroom, spinach on wood fired pizza! The AI likes power, vram, and sprinkle of bandwidth
Happy Holidays Robots! 🤖🎄🎉
I'm an outdated high end 4GB gpu from 2014 (so no acceleration) and an outdated high end cpu from 2010 lmfao it takes several hours before the first letter of a response is typed back to me so I can only really do a single prompt a day hahaha. also the whole time it's thinking my system is sucking down 270W and only idles at 145W.... I guess if my gpu helped it would add another 250W or so...
Haha I feel you there!
great benchmark. 4060ti seems to be a good choice for running self-fine-tuned 8b models locally, decent speed and low cost.