38
115 326

Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia Cuda Drivers, Toolkit, LocalAI!

45:58

Building Ubuntu Server for AI and LLMs from scratch Part 1: Hardware build, Open Air Case, Multi-GPU

1:07:56

Cutting up a Dell R720xd to add 2x Nvidia Tesla M40 24GB GPUs - Llama 3.3 70B, QwQ 32B, and more!

26:36

3 Uncensored AI comedians do the 12 Days of Christmas and roast AI Engineers with homelabs! 🎄

7:16

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

13:44

3 AI Comedians talk about Running a GPU Homelab for AI and Machine Learning using uncensored models!

4:04

Building Ubuntu Server for AI LLMs from scratch Part 3: ComfyUI, Open WebUI, Flowise, n8n, and more!

Building Ubuntu Server for AI LLMs from scratch Part 3: ComfyUI, Open WebUI, Flowise, n8n, and more!
Even with a cold and terrible voice - this week we are working on Part 3 of the series and getting a huge amount of Local AI based tools to use and play with on top of our LocalAI server. We get ComfyUI, Open WebUI, Flowise, n8n, Postgres, Chroma, and Unstructured API all running locally in our lab! We also quickly cover integrating VS Code and our AI Lab using Continue.dev (www.continue.dev/) with our LocalAI stack. Everything done locally, go build your workflows, agents, and more!
ComfyUI - Image/Video Generation - github.com/comfyanonymous/ComfyUI
Open WebUI - For local served chat application - github.com/open-webui/open-webui
Flowise - No code drag and drop AI/LLM application builder -github.com/FlowiseAI/Flowise
n8n - AI/LLM Automation Build low code - github.com/n8n-io/n8n
Postgres - for databases for Flowise and n8n
Chroma - AI/LLM Database for embeddings, vector storage - docs.trychroma.com/docs/overview/introduction
Unstructured API - For pre processing many file types - github.com/Unstructured-IO/unstructured-api
Github Repo for RoboTF AI Suite of compose files and instructions:
github.com/kkacsh321/robotf-ai-suite
Just a fun day on the workbench, grab your favorite relaxation method and join in.
Our website: robotf.ai
Machine specs here: robotf.ai/Machine_Lab_Specs
(These are affiliate-based links that help the channel if you purchase from them!)
Machine Components:
Open Air Case amzn.to/4a7V9pi
30cm Gen 4 PCIe Extender amzn.to/3Unhclh
20cm Gen 4 PCIe Extender amzn.to/4eEiosA
2 TB NVME amzn.to/4gWFcFb
EVGA SuperNova 1600 G+ Power Supply amzn.to/3XWorBB
240GB Crucial SSD amzn.to/406aIJA
G.SKILL Ripjaws V Series DDR 64GB Kit amzn.to/4dAZrWm
Core I9 9820x amzn.to/47UuIST
Thermalright AK90 CPU Cooler: amzn.to/4iYRSwf
Noctua Thermal Paste: amzn.to/4fMenSq
Supermicro CX299-PGF Logic Board amzn.to/3BxbWVr
Remote Power Switch amzn.to/3BubQOg
Your results may vary due to hardware, software, model used, context size, weather, wallet, and more!

มุมมอง: 546

วีดีโอ

Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia Cuda Drivers, Toolkit, LocalAI!

45:58

Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia Cuda Drivers, Toolkit, LocalAI!

มุมมอง 86614 วันที่ผ่านมา

Building Ubuntu Server for AI and LLMs from scratch Part 2: Nvidia Cuda Drivers, Toolkit, LocalAI! This week we are working on Part 2 of the series and getting the node bootstrapped with all the necessary software needed to run our first target of LocalAI (localai.io/) with Docker and Docker Compose. Github Repo for RoboTF AI Suite of compose files and instructions: github.com/kkacsh321/robotf-...

Building Ubuntu Server for AI and LLMs from scratch Part 1: Hardware build, Open Air Case, Multi-GPU

1:07:56

Building Ubuntu Server for AI and LLMs from scratch Part 1: Hardware build, Open Air Case, Multi-GPU

มุมมอง 2.1K21 วันที่ผ่านมา

Building Ubuntu Server for AI and LLMs from scratch Part 1: Hardware build, Open Air Case, Multi-GPU This week we are starting on building another AI Server from scratch using what we got in laying around in the lab, and some new parts. We will walk through putting the case, and hardware together while blabbing and maybe answering a few viewer questions along the way. This will be the first in ...

Cutting up a Dell R720xd to add 2x Nvidia Tesla M40 24GB GPUs - Llama 3.3 70B, QwQ 32B, and more!

26:36

Cutting up a Dell R720xd to add 2x Nvidia Tesla M40 24GB GPUs - Llama 3.3 70B, QwQ 32B, and more!

มุมมอง 68321 วันที่ผ่านมา

Cutting up a Dell R720xd to add 2x Nvidia Tesla M40 24GB GPUs - Llama 3.3 70B, QwQ 32B, and more! This week we are literally cutting up my Dell R720xd TrueNAS Scale server to add 2 Nvidia Tesla M40 24GB GPU's (old cards from my original setup) for a backup inference server for when my other nodes might be under maintenance. We will take it through Llama 3.3 70B IQ4, Qwen QwQ 32B Preview, Mistra...

3 Uncensored AI comedians do the 12 Days of Christmas and roast AI Engineers with homelabs! 🎄

7:16

3 Uncensored AI comedians do the 12 Days of Christmas and roast AI Engineers with homelabs! 🎄

มุมมอง 11728 วันที่ผ่านมา

3 Uncensored AI comedians do the 12 Days of Christmas and roast AI Engineers with homelabs! 🎄 This week on the RoboTF Development Desk: We whip together the ability to have three different AI Robot comedian personalities running a comedy show together. Random conversation, Text To Speech, and so on. Tonight's topic is "AI Engineers with GPU Homelabs and the 12 days of Christmas" Happy Holidays!...

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

13:44

LocalAI LLM Testing: Llama 3.3 70B Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference

มุมมอง 2.2Kหลายเดือนก่อน

LocalAI LLM Testing: Llama 3.3 70B Instruct Q8, Multi GPU 6x A4500, and PCIe Bandwidth during inference This week we are taking Llama 3.3 70B (huggingface.co/bartowski/Llama-3.3-70B-Instruct-GGUF) at a Q8 quant running 96k of context through some tests but focusing on showing the PCIe bandwidth during inference in a multi gpu setup. Hopefully providing more insight for hardware requirements, an...

3 AI Comedians talk about Running a GPU Homelab for AI and Machine Learning using uncensored models!

4:04

3 AI Comedians talk about Running a GPU Homelab for AI and Machine Learning using uncensored models!

มุมมอง 252หลายเดือนก่อน

3 AI Comedians talk about Running a GPU Homelab for AI and Machine Learning using uncensored models! 🦾 🤣 This week on the RoboTF Development Desk: We whip together the ability to have three different AI Robot comedian personalities running a comedy show together. Random conversation, Text To Speech, and so on. Tonight's topic is "Running expensive GPU's in a homelab for AI and Machine Learning"...

Project: Count your tokens for Huggingface models! 🪙 AutoTikTokenizer and RoboTF LLM Token Estimator

6:01

Project: Count your tokens for Huggingface models! 🪙 AutoTikTokenizer and RoboTF LLM Token Estimator

มุมมอง 189หลายเดือนก่อน

Project: Count your tokens for Huggingface models! 🪙 AutoTikTokenizer and RoboTF LLM Token Estimator This week on the RoboTF Development Desk: Tired lab day after thanksgiving.. We use Streamlit, Python, and AutoTikTokenizer to create a quick Token Estimator for HuggingFace open source models! I want to encourage everyone to go build something, anything, learn, and have some fun along the way. ...

Mistral 7B LLM AI Leaderboard: Gigabyte AMD Radeon RX 7600 XT 16GB uses ROCm to hit leaderboard!

20:56

Mistral 7B LLM AI Leaderboard: Gigabyte AMD Radeon RX 7600 XT 16GB uses ROCm to hit leaderboard!

มุมมอง 1.5Kหลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Gigabyte AMD Radeon RX 7600 XT 16GB uses ROCm to hit leaderboard! This week in the RoboTF lab: We take the Gigabyte AMD Radeon RX 7600XT through the Mistral 7B Leaderboard tests using ROCm. Where do you think it will land? Final Results around 15 min mark RoboTF Website: robotf.ai/Mistral_7B_Leaderboard Spend a ☕️ worth of time in the RoboTF lab, and let's put som...

LLM Testing Behind the Scenes AMD Radeon 7600 XT ROCm set flash_attention to true! 🤦‍♂️

2:57

LLM Testing Behind the Scenes AMD Radeon 7600 XT ROCm set flash_attention to true! 🤦‍♂️

มุมมอง 790หลายเดือนก่อน

LLM Testing Behind the Scenes AMD Radeon 7600 XT ROCm set flash_attention to true! 🤦‍♂️ This week in the RoboTF lab: Quick follow up on the interesting performance profile with ROCm and the quants versions. TLDR - turn flash_attention to true and watch the performance difference and make me do a face palm for not trying the earlier. This is the previous video on the flash_attention setting with...

LLM Testing Behind the Scenes Unboxing Gigabyte AMD Radeon 7600 XT 16GB ROCm vs Vulkan battle!

32:09

LLM Testing Behind the Scenes Unboxing Gigabyte AMD Radeon 7600 XT 16GB ROCm vs Vulkan battle!

มุมมอง 9372 หลายเดือนก่อน

LLM Testing Behind the Scenes Unboxing Gigabyte AMD Radeon 7600 XT 16GB ROCm vs Vulkan battle! This week in the RoboTF lab: We do some behind the scenes testing for pre-leaderboard runs with a brand new AMD Radeon 7600 XT 16GB and bring everyone along for the ride. We need to figure out which backend to run this card with for the leaderboard tests. Turns into ROCm vs Vulkan backend battle - whi...

Mistral 7B LLM AI Leaderboard: Unboxing AsRock Intel ARC A770 16GB and running it through our tests!

29:10

Mistral 7B LLM AI Leaderboard: Unboxing AsRock Intel ARC A770 16GB and running it through our tests!

มุมมอง 1.9K2 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Unboxing AsRock Intel ARC A770 16GB and running it through our tests! This week in the RoboTF lab: We unbox a brand new AsRock Intel ARC A770 16GB and I fight it for weeks! Mash of linux, intel drivers, intel oneapi kit, etc, etc issues....but it works with Sycl backend and llama.cpp - let's run it through the leaderboard tests. Where do you think it will land amo...

Mistral 7B LLM AI Leaderboard: Apple M1 Max MacBook Pro uses Metal to take on GPU's for their spot!

20:31

Mistral 7B LLM AI Leaderboard: Apple M1 Max MacBook Pro uses Metal to take on GPU's for their spot!

มุมมอง 4K2 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Apple M1 Max MacBook Pro uses Metal to take on GPU's for their spot! This week in the RoboTF lab: Quick mid week update where we take my personal MacBook Pro M1 Max with 32GB from 2021 through the Mistral Leaderboard tests. Where do you think it will land amongst the GPU's and other CPU's we have tested? Final results at 13 Min Mark. Leaderboard is live: robotf.ai...

Mistral 7B LLM AI Leaderboard: i5-12450H 32GB DDR4 Mini PC takes on the leaderboard?

16:30

Mistral 7B LLM AI Leaderboard: i5-12450H 32GB DDR4 Mini PC takes on the leaderboard?

มุมมอง 9472 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: i5-12450H 32GB DDR4 Mini PC takes on the leaderboard? This week in the RoboTF lab: Whole lab is completely torn apart, everything in house disaster, and fixing a really dumb power button issue with main 6xA4500 3090 Lab machine... Then we jump into running a Mini PC with a mobile based i5 through the leaderboard. Where will it land? This was long but fun, and agai...

Halloween Stories via Streamlit, Langchain, Python, and LocalAI (or OpenAI) with Text to Speech!

19:59

Halloween Stories via Streamlit, Langchain, Python, and LocalAI (or OpenAI) with Text to Speech!

มุมมอง 2393 หลายเดือนก่อน

RoboTF Halloween Stories via Streamlit, Langchain, Python, and LocalAI ( OpenAI) with Text to Speech This week in the RoboTF lab: I want to encourage everyone to go build something, anything, learn, and have some fun along the way. We introduce the RoboTF Halloween Stories application, walk through it a bit, listen to a few stories, play with different setups. Even do a quick demo of LocalAI ru...

Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU 20GB Where does prosumer/enterprise land?

21:15

Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU 20GB Where does prosumer/enterprise land?

มุมมอง 7553 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Nvidia RTX A4500 GPU 20GB Where does prosumer/enterprise land?

LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2

15:29

LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2

มุมมอง 4673 หลายเดือนก่อน

LocalAI LLM Tuning: WTH is Flash Attention? What are the effects on memory and performance? Llama3.2

Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS?

17:00

Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS?

มุมมอง 3.3K3 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4090 Windforce 24GB Can it break 100 TPS?

Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down!

15:36

Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down!

มุมมอง 7393 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: The King of the Leaderboard? Nvidia RTX 3090 Vision 24GB throw down!

Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4070Ti Super 16GB and giving it run!

19:27

Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4070Ti Super 16GB and giving it run!

มุมมอง 1.5K4 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Unboxing an Nvidia RTX 4070Ti Super 16GB and giving it run!

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB

16:46

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB

มุมมอง 9064 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia RTX 4060Ti 16GB

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB

26:49

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB

มุมมอง 8134 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia Tesla M40 24GB

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660

25:55

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660

มุมมอง 5654 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: GPU Contender Nvidia GTX 1660

Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000

34:32

Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000

มุมมอง 4234 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Rules of Engagement and first GPU contender Nvidia Quadro P2000

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X

36:12

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X

มุมมอง 4064 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3,Q4,Q5,Q6,Q8, and FP16 CPU Inference i9-9820X

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X

12:09

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X

มุมมอง 3734 หลายเดือนก่อน

Mistral 7B LLM AI Leaderboard: Baseline Testing Q3 CPU Inference i9-9820X

LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!

18:40

LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!

มุมมอง 2K5 หลายเดือนก่อน

LocalAI LLM Testing: Part 2 Network Distributed Inference Llama 3.1 405B Q2 in the Lab!

LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes

46:24

LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes

มุมมอง 8K5 หลายเดือนก่อน

LocalAI LLM Testing: Distributed Inference on a network? Llama 3.1 70B on Multi GPUs/Multiple Nodes

LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB

36:05

LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB

มุมมอง 16K5 หลายเดือนก่อน

LocalAI LLM Testing: Llama 3.1 8B Q8 Showdown - M40 24GB vs 4060Ti 16GB vs A4500 20GB vs 3090 24GB

LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4

21:40

LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4

มุมมอง 22K6 หลายเดือนก่อน

LocalAI LLM Testing: How many 16GB 4060TI's does it take to run Llama 3 70B Q4

ความคิดเห็น

@ovalwingnut 14 ชั่วโมงที่ผ่านมา
1st 5:30 seconds of talking... I bet it was going to start any time but I just ran out of patients.. GR8T Job. You RoCk!
@SphereNZ วันที่ผ่านมา
Thanks!
@RoboTFAI 23 ชั่วโมงที่ผ่านมา
Wow thanks for that, much appreciated!
@gamingloo-x5n 4 วันที่ผ่านมา
Hi, I am a new to all of this. I follow the video, but got an error when I run the docker common when all the docker images finished downloading. My Ubuntu system has one 512 nvme drive. Error response from daemon: error while creating mount source path '/var/localai/models': mkdir /var/localai: read-only file system
@RoboTFAI 4 วันที่ผ่านมา
Do a `git pull origin main` to pull the latest code down (that path is very specific to my setup) - or you will need to change the `volumes` setup in the localai-compose.yaml to mimic this instead github.com/kkacsh321/robotf-ai-suite/blob/main/LocalAI/localai-compose.yaml#L24C1-L26C26 which is just `- ../models:/models`.
@gamingloo-x5n 4 วันที่ผ่านมา
Thank you for the quick reply, Yes, I did pull your latest git repo and set volume path /var/localai/models and have mkdir 2 directory /var/localai/ and /var/localai/models, follow instructions to create newgrp docker and set permisson
4 วันที่ผ่านมา
Absolutely excellent work! Just what I needed. Just in time. :)
@RoboTFAI 4 วันที่ผ่านมา
Great to hear!
@sergiynazarenko1542 5 วันที่ผ่านมา
Did bro just cover up the sound of his toilet flushing by going "wow did yall hear thise jets flying over the house?"😂😂
@RoboTFAI 4 วันที่ผ่านมา
If only I could have the full lab setup in the bathroom! 🚽
@X2cao 5 วันที่ผ่านมา
Can you have mix VRAM GPUs to run Llama? Ex 24gb and 16gb RTX 40 series?
@RoboTFAI 4 วันที่ผ่านมา
Yep, can absolutely mix Nvidia cards together. We have another video where we address just this question: th-cam.com/video/guP0QW_9410/w-d-xo.htmlsi=lHSbK6WSS8lvdoFV if you want more in depth answer.
@edwardvanhazendonk 9 วันที่ผ่านมา
Thanks for sharing all this invaluable information. Nicely explained from the start and understandably. 🎉
@RoboTFAI 7 วันที่ผ่านมา
My pleasure 😊
@rva7309 9 วันที่ผ่านมา
I was tinkering with llama2.cpp last year, and had a lot of fun with it in the home lab. Been away for a while, to say I'm more than impressed with both localai and this video series is an understatement. Excellent, excellent work. Whoever the hell you are, a big thank you!
@ShadowsandWhiskey 11 วันที่ผ่านมา
Hey, I love the channel. I started making my own setup a few months ago and came across this when researching docker. New to programing environment and ubuntu. I come from the industrial automation side PLC's and networking. I am going to do a fresh install following your guide on a Node I am setting up for practice.
@RoboTFAI 10 วันที่ผ่านมา
Much appreciated! Hope you find it all a good starting place to get up and going quickly, getting started is always the hardest part.
@ShadowsandWhiskey 10 วันที่ผ่านมา
@RoboTFAI exactly. I got my primary system set up after a month of configuring. I learned a lot about Linux in the process through troubleshooting, and I think 5 install and build from the ground up. This will be perfect for the secondary setup I want to connect to. I will get you a coffee for the time and energy, lol
@andysworld4418 11 วันที่ผ่านมา
If i interpret your Test right, then you have enough memory with 3 cards ( 48gb ), but Not enough GPU Prozess Power.I have a 3090 and 2x 4060 ti 16gb. Will Test If i got more then 5t/s....
@glenswada 2 วันที่ผ่านมา
The speed is more to do with the limited 128-bit memory interface and 272 GB/s of bandwidth of the 4060. Your 3090 for example is three times faster with its 384-bit memory interface and 935.8 GB/s of bandwidth. If you swapped those 2x 4060's, for another 3090, you should achieve ~15+ t/s. M4 Max with its 546GB/s of bandwidth runs at 8-9 t/s New Strix Halo with its memory bandwidth of 238 GB/s should run at around ~4 t/s My AM5 Board with Ram set at 6000Mhz has memory bandwidth of 90 GB/s runs 70BQ4 at 1.5 t/s
@upByNune 11 วันที่ผ่านมา
I have a r730xd with 1 m40 installed. I’m looking to install my second m40 but it looks like it physically doesn’t fit into the second pci slot. Do I have to cut something’s ? Hahah
@RoboTFAI 10 วันที่ผ่านมา
I don't have an R730, but assume it's very similar to the 720 - best I can could say is maybe? It should support two full length cards according to the docs....I would be concerned about the cooling of course. I have my other PCIe slots filled, so was 0 chance for me, and not afraid of hacking things up!
@lamboratorioaive 12 วันที่ผ่านมา
Have you tested llama 70b on the cpu only?
@madbike71 14 วันที่ผ่านมา
Wow, I thougth that would be much much more traffic. So mi Thunderbolt 3 eGPU is perfectly fine as a complement to an internal card.
@RoboTFAI 4 วันที่ผ่านมา
most people do, at least for just inference - which is why I decided I would show folks the data from "enterprise" level cards.
@madbike71 15 วันที่ผ่านมา
This video is really good. Very informative. I would like to know how different pcie size sockets affect tensor splitting, as the cards send their results to each other.
@RoboTFAI 14 วันที่ผ่านมา
As in say 16x vs 8x etc? I think we could do that - if interested in just seeing the PCIe traffic in a multi-gpu setup you may enjoy this video th-cam.com/video/ki_Rm_p7kao/w-d-xo.html where we show PCIe traffic during loading and inference with enterprise grade cards that emit those metrics.
@madbike71 14 วันที่ผ่านมา
@@RoboTFAI Hey, thank you very much. I'll see it tonight.
@Sigmatechnica 15 วันที่ผ่านมา
would be interesting to see some 13/20/70b models too
@R.K.G.L.L 16 วันที่ผ่านมา
hi 2 rtx 4060 ti 16gb = to 32gb vram +-??
@RoboTFAI 4 วันที่ผ่านมา
Yes, at least for running inference!
@miket64 16 วันที่ผ่านมา
Thank you so much for sharing this; it’s been really useful. Could you please consider creating a written guide as well?
@RoboTFAI 16 วันที่ผ่านมา
Sure thing! Will add the base commands for bootstrapping and getting starting etc to the repo as soon as I get time. Thanks for suggestion!
@glenswada 17 วันที่ผ่านมา
I am certainly learning stuff from your video's. Perfect pace. Thank you for your efforts.
@RoboTFAI 16 วันที่ผ่านมา
Much appreciated and great to hear!
@JackWeems 17 วันที่ผ่านมา
This is amazing. Thanks for all the effort
@RoboTFAI 16 วันที่ผ่านมา
My pleasure! Came from all the viewers asking for these guides. Appreciate you guys watching
@MarioCRO 17 วันที่ผ่านมา
Great video, the exact content that was necessary. I propose K8s setup for one of the next videos... I played locally with Ollama, vLLM and Triton and Triton with vLLM backend. But LocalAI was the next thing I wanted to check out...and just in time ;)
@RoboTFAI 16 วันที่ผ่านมา
Much appreciated! Great idea, I think we will go through a few more things - then repeat for K8s (since that's what I prefer to have my nodes in).
@madbike71 17 วันที่ผ่านมา
Hi, one question, if you mix cards, will they all run at the speed of the slower one? or each one would max? Or how much do the cards talk to each other, or they just split the work and go?
@RoboTFAI 17 วันที่ผ่านมา
We have another older video answering that question directly if you want to watch th-cam.com/video/guP0QW_9410/w-d-xo.html
@madbike71 17 วันที่ผ่านมา
@@RoboTFAI Thanks, excellent video. It would be great to test a model that does not fit in a single card, and see how different mix of cards perform by fullfilling the vram.
@GianMarcoMensi 18 วันที่ผ่านมา
Excellent content as usual: Ash would be proud!
@RoboTFAI 17 วันที่ผ่านมา
🤣 Thanks for watching, hoping this series will help people get started on their own!
@thphon2004 18 วันที่ผ่านมา
this is amazing! can't wait to see how the distributed inferencing works!
@RoboTFAI 17 วันที่ผ่านมา
We have a few other videos on the channel that touch on distributed inference! But we will cover that also in this series for people who want to do it.
@Rob_Kandels 18 วันที่ผ่านมา
how are you using Mac OS with NVIDIA GPUs?
@RoboTFAI 17 วันที่ผ่านมา
We are not, we running the cards on Ubuntu based nodes (mostly in Kubernetes) remotely on the channel. We do test the Macs in a few other videos with Metal though!
@StenIsaksson 20 วันที่ผ่านมา
AI? Apple Intelligence? ;)
@Nnamdxxx 20 วันที่ผ่านมา
I am retired/disabled IT person myself. I have a unraid server I like a lot. I have been playing with AI stuff but find my Nvidia 3060 12gb vram has it's limits. I no longer have a budget to buy much anymore. My unraid server is so packed I cannot add anymore hardware to it and it has some key dockers running that I don't like playing on it to much. Plus my disability makes it hard to work on PCs in those big cases. So I have a old pc laying around so I ordered a test bench station, some PCIe extenders cables and some used Nvidia K40's. I plan to do what you are doing here so I will be following your channel. I'm just unsure if or how I can link all the NVidia K40's 24 gb Vram cards to get 48 gb vram. I plan to run Comfyui. maybe run dockers on it as well. Look forward to more of your videos.
@BirdsPawsandClaws 20 วันที่ผ่านมา
Not a boring video at all! I plan to build my own AI server in the very near future! Thanks for the walk through.
@RoboTFAI 16 วันที่ผ่านมา
Glad you enjoyed it!
@Minotaurus007 21 วันที่ผ่านมา
The intent of using multiple GPUs is to run larger models. So this test is not very useful and the results might be not surprising. So what about 33b- or 70b-models with this setup?
@passion4z846 21 วันที่ผ่านมา
I started building mine about 3 weeks ago. But just with an rx 7900 xtx 24gb. First time trying ubuntu with LLM. Very soon I was stuck with network card driver. Apparently some drivers are not available in Linux 😂😂 I am crying while looking for solutions.
@pomdiepom 19 วันที่ผ่านมา
If youre hardware is not supported in linux mostly it is crappy hardware
@dfcastro 21 วันที่ผ่านมา
When you run multi GPU I am assuming no SLI setup is required. Is that right?
@RoboTFAI 21 วันที่ผ่านมา
That is correct - no SLI required!
@orion9k 22 วันที่ผ่านมา
What is LocalAI ? Is it like CGBT ?
@RoboTFAI 22 วันที่ผ่านมา
localai.io/
@socialpower4789 23 วันที่ผ่านมา
An interesting part of this test that you didn't cover in your charts is the power draw (which was visible during the testing), beyond mentioning heating your house with M40s. The cost of power, plus the card should be considered for "bang for the buck". In this case, it appears that with the popularity gain pushing up prices for the Tesla cards and the a4500 ($1500 ish today on Ebay for used cards), the 4060 ti represents by far the best bang for the buck in a consumer machine of what you tested.
@RoboTFAI 23 วันที่ผ่านมา
Thanks for watching we try to make as much data available as possible. You may want to check out our Mistral Leaderboard (robotf.ai/Mistral_7B_Leaderboard) and associated videos/tests where we do tend to track average power usage on the cards for the community.
@JackWeems 23 วันที่ผ่านมา
Thanks for doing the stable diffusion tests. I had written off cards this old because I thought they wouldn't do video. I'll have to re-think my server build now.
@RoboTFAI 23 วันที่ผ่านมา
They will do it! Not with speed but will do it. Still not bad for 10 year old cards that can be had fairly cheap.
@chimpera1 23 วันที่ผ่านมา
Have we discovered what AI can't do?
@RoboTFAI 23 วันที่ผ่านมา
It still hasn't been able to make me a pizza 🍕 - it however can order one for me so I guess there is that! 🤣
@shawnvines2514 23 วันที่ผ่านมา
I'm sad that there are no $249 Jetson Orin Nano Super's available. I have a solution that needs 5 different small models running simultaneously and the 8GB cards would be have been great for that. I have two AI inference servers: One 7900 XTX and One 4060 Ti 16GB. Loved the hardware modification video, I definitely wouldn't call it carnage until you break out the giant tin snips and break a few PCBs and rewire a few traces :)
@RoboTFAI 23 วันที่ผ่านมา
Interesting thoughts Shawn, I hadn't even considered using the Jetson's really - just seemed out of sorts but would def be low power consumption. I love me some mini pc's so might have to re-look at him. I also have needs to run several smaller models in parallel and why I ended up shoving these old cards in the server so I can use it as a backup/secondary inference server for some of my automation/apps. It might also be additional node in distributed if/when needed. I did use tin snips along with the jigsaw! 🤣 I tried to avoid any sparks or damage to the actual electronics and only had to pull out the soldering station for a few wires for the fans. 🔥
@MagagnaJayzxui 23 วันที่ผ่านมา
Have you tested P40s? Were you using llama.cpp directly? Can you dive into context length and context caching, tensor parallelism etc with llama.cpp on these older cards ❤
@RoboTFAI 23 วันที่ผ่านมา
I don't have Pascal based cards (well worth testing, just a P2000) in the lab to really play with. I run LocalAI (localai.io/) which in these tests for inference uses llama.cpp under the hood. LocalAI is a server/wrapper for many backend engines. I have touched on tensor splitting (not necessarily parallelism) in another video - what would you like to see more on?
@MagagnaJayzxui 23 วันที่ผ่านมา
1st
@RoboTFAI 23 วันที่ผ่านมา
Warning - if you are sensitive to hardware carnage don't watch first parts 🤣Cases are SO confining.... Happy holidays and what are you doing with your old junk hardware sitting around?
@eukaliptal 23 วันที่ผ่านมา
dunno, going from 10 tokens/sec to 30 tokens/sec, yes its 3x but still feels slow Excellent content, thank you
@RoboTFAI 23 วันที่ผ่านมา
Thanks for watching! We try to give data to folks - you might be more of 3090/4090 💰 fan and can checkout our leaderboard series and website (robotf.ai/Mistral_7B_Leaderboard)
@jcirclev2 24 วันที่ผ่านมา
I am going back into your videos to see if you did one, but would LOVE to see what a p40 will do.
@RoboTFAI 23 วันที่ผ่านมา
Apologies, haven't had my hands on many Pascal based cards in the lab - minus the P2000 I use for video encoding. I mostly skipped over that generation in my hardware upgrades in the lab. But we did take the P2000 through a bit of the Leaderboard tests on the channel in a different video. 🤷
@markmoorcroft7570 24 วันที่ผ่านมา
My Apple federal contact was telling me about TB -> fiber channel interfaces. Wondering about a cluster of Mac M4 minis with fiber channel.
@RoboTFAI 23 วันที่ผ่านมา
In theory... slower than a single one, however would have capabilities to run VERY large models or many smaller models at once. With FC based storage would also help with initial loading time of models (fast storage -> fast memory/vram = quicker loading times)
@markmoorcroft7570 24 วันที่ผ่านมา
I'm very curious how my brand new 64g M4 Pro mini will do compared to the M1 max. Do you know if any use of the inference cores is on the horizon or valuable? Maybe only for training? I'm pretty new to this. In fact so far I have only installed Ollama, which I assume is way slower, and not Metal?
@RoboTFAI 24 วันที่ผ่านมา
LocalAI (build from source. which is what is shown in this video), Llama.cpp, Ollama all support Metal in one way or another. If just using for quick inference and maybe a basic api server endpoint I would highly suggest just running LM Studio on the Mac easiest and great interface. Can turn on the server and use as an endpoint for coding assistants/etc/etc.
@markmoorcroft7570 24 วันที่ผ่านมา
@ Thanks my friend. That sounds like good advice. The company I work for just starting building an AI in AWS. I'm just a lowly 25 year veteran sys admin working with NASA as a contractor.
@JackQuark 25 วันที่ผ่านมา
Great video. Seems 4060Ti is still too slow for serious use.
@kevindjohnson 25 วันที่ผ่านมา
Great video, thanks. I would suggest making the particular screens you're focusing on bigger and more readable.
@RoboTFAI 24 วันที่ผ่านมา
Thanks for the tip! I stopped recording in 4k and tried to make things more visible for folks in the newer videos, this one kinda old now so my apologies.
@cmeooo 25 วันที่ผ่านมา
is that a must to put same gpus there to run llm on multi gpu can i go with 3090 and 4090 together?
@RoboTFAI 25 วันที่ผ่านมา
You can absolutely mix different types of Nvidia cards together. We have a video that directly answers that question (and do it all the time in our lab). th-cam.com/video/guP0QW_9410/w-d-xo.html
@DavidOgletree 16 วันที่ผ่านมา
@@RoboTFAIcan you mix nvidia and amd?
@WenMare 26 วันที่ผ่านมา
Matrix cores: Am I a joke to you?
@shawnvines2514 หลายเดือนก่อน
So... What's your favorite pizza? And what's your AI's favorite pizza?
@RoboTFAI 25 วันที่ผ่านมา
I like pepperoni, mushroom, spinach on wood fired pizza! The AI likes power, vram, and sprinkle of bandwidth
@RoboTFAI หลายเดือนก่อน
Happy Holidays Robots! 🤖🎄🎉
@TheJunky228 หลายเดือนก่อน
I'm an outdated high end 4GB gpu from 2014 (so no acceleration) and an outdated high end cpu from 2010 lmfao it takes several hours before the first letter of a response is typed back to me so I can only really do a single prompt a day hahaha. also the whole time it's thinking my system is sucking down 270W and only idles at 145W.... I guess if my gpu helped it would add another 250W or so...
@RoboTFAI 25 วันที่ผ่านมา
Haha I feel you there!
@JackQuark หลายเดือนก่อน
great benchmark. 4060ti seems to be a good choice for running self-fine-tuned 8b models locally, decent speed and low cost.