Gaming on NVIDIA Tesla GPUs - Part 2 - NVIDIA Pascal

Craft Computing

มุมมอง 40 273

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 ต.ค. 2024

ความคิดเห็น • 216

@zeroforkgiven 3 หลายเดือนก่อน ⁺⁵⁶
Price at the launch of this video for the Tesla P4 is ~105 on eBay. Very curious what it will be tomorrow.
@CraftComputing 3 หลายเดือนก่อน ⁺³⁹
Do I post a pre-emptive sorry, or wait until they're $250?
@logan_kes 3 หลายเดือนก่อน ⁺¹¹
@@CraftComputingsomething has to have driven up p4 prices in the last month or so. In December last year I picked up a pair of p4’s for $85 each (trended $75-$80 from China or $80-90 from U.S. sellers) and now they have shot up to close to $110 for us sellers and $95 for China sellers. I’d imagine your video will bump these up even more 😅 glad I just got 2x Tesla p40’s though last week before those go up too 😂
Keep up the great content, as you get more and more popular you will start to mess with the used enterprise equipment market more and more to the point you will need to put a disclaimer in your videos stating *pre video pricing* lol
@zeroforkgiven 3 หลายเดือนก่อน ⁺⁷
@@CraftComputing LOL, its the Craft Effect. I don't mind as I already own 2 of them (the best Plex hardware card IMO) and the prices will fall back down in a few weeks.
@garthkey 3 หลายเดือนก่อน ⁺⁴
Yea after he post videos they spike. I just bought the ASUS gaming server from a couple videos ago. Original price was 175. Then it spiked to 250
@JPDuffy 3 หลายเดือนก่อน ⁺⁴
I bought one for $50 in February. It's excellent, but I don't think it's worth $100+ considering the extra work to setup. I have it running in a 4970K Dell and it plays 1080p games at max settings 60+ without breaking a sweat.
@BrinkGG 3 หลายเดือนก่อน ⁺²⁷
I've been waiting for this one! Was holding out on buying a P40 or P100 until this came out. Thanks Jeff. :D
@ProjectPhysX 3 หลายเดือนก่อน ⁺⁴⁶
Main difference between P100 and P40 is not the VRAM. The P100 has 1:2 FP64:FP32 ratio, for the P40 (and all other Pascal GPUs) it's 1:32, basically incapable of FP64.
P100 is much better for certain computational physics workloads that need the extra precision, like molecular dynamics or orbital mechanics.
@OliverKr4ft 3 หลายเดือนก่อน ⁺¹⁰
All games use FP32 though, so the additional FP64 FPUs on the P100 make no difference
@sidichochase 3 หลายเดือนก่อน ⁺⁷
@@OliverKr4ft For gaming. but for people who want a nice cheap GPGPU, the p100 is the better choice.
@TheLibertyfarmer 3 หลายเดือนก่อน
P100 can do 2:1 fp16 to fp32 ratio too which makes it much faster at training at 1/2 precision than the P40 as well having NVlink support. and thus more efficient for training in general
@gg-gn3re 3 หลายเดือนก่อน ⁺¹
@@OliverKr4ft Yea, gaming isn't "molecular dynamics or orbital mechanics" if you didn't know
@MiG82au 3 หลายเดือนก่อน
@@OliverKr4ft The claim in the video is that the unique chip is for HBM2, which is arguably wrong and at least only half the story because the biggest difference is the huge count of FP64 execution engines.
Whether games use FP64 or not is irrelevant to why the GP100 chip exists.
@Americancosworth 3 หลายเดือนก่อน ⁺⁴⁰
Hurray! More good ideas for my poor life decisions (building a cloud gaming server)
@samthedev32 3 หลายเดือนก่อน ⁺⁶
I have been waiting for this video for so long!
I was planning to get a P4, and now I want it even more :)
@dectoasd3644 3 หลายเดือนก่อน ⁺⁵
1 minute in and I'm already excited with my 2 x P40
@Satori-Automotive 3 หลายเดือนก่อน
how does a single one perform in rendering and editing compared to something like a 1080ti?
@edgecrush3r 3 หลายเดือนก่อน ⁺⁶
I am running the p4 for almost one year 24x7 and absolutely love this card. I have more projects running on this thing now, my NAS doesnt classify as NAS anymore😅 more vGPU Emulation server, enjoying Mario Kart on many connected devices with the whole family 😂 its just so dang cheap now, its impossible to beat and great for inferencing llms (the p100 would be better due to faster memory).. i am now hoping the t4 will drop in price.
@davidfarnham3548 3 หลายเดือนก่อน ⁺¹⁶
Really curious to see how a t4 performs vs a p4
@KiraSlith 3 หลายเดือนก่อน ⁺⁴
Ehhh... As I understand it, the P4 has two working NVENC engines on-die, but the T4 is a custom compute-targeted die from word go. You'll get more for your money from the P4 if you're using it for virtualization/transcode, especially since the P4 is still staying sub $130, where the T4 is hovering around $600 at the moment.
If however, you're looking for FP16 compute specifically (like for AI tasks), the T4 is fast enough it competes with a 3090 while staying at 75w. It's a spectacular monster within that specific arena only, it's FP32 is pretty miserable for it's price however, which is what games make the most use of.
@DerrangedGadgeteer 3 หลายเดือนก่อน ⁺¹
I'm so glad you ran these benchmarks! I'm elbows deep building a multipurpose virtualization/AI server out of 2nd gen threadripper and p100's. It's good to know what to expect, and also that my expectations weren't way off base when I started.
@SpoonHurler 3 หลายเดือนก่อน ⁺⁸
I agree with you on benchmarking LLMs and AI (or advanced logic generation). Many benchmarks will also be irrelevant in a year (my opinion, not a fact) I wouldn't waste time making possibly bad results in such a chaotic environment unless I was very equipped to do so.
I do think a video of playing around/ learning LLMs could be interesting though... With no comparative numbers, just a journey episode.
@CraftComputing 3 หลายเดือนก่อน ⁺⁴
Yeah, I did a couple videos on Stable Diffusion last year, where I explored running it in my homelab.
@anthonyguerrero4612 3 หลายเดือนก่อน ⁺¹
Wow, I wasn't expecting this, thank you for further experimenting. 😊
@Prophes0r 3 หลายเดือนก่อน ⁺³
I know it is comparing completely different families, but I'm interested in comparing the P4 to an A380 in straight passthrough.
The A380 can be had new for $120ish for the half-height cards. That's in the same ballpark.
I know we will never get SR-IOV for the ARC cards (Except maybe the A770 with major hacks). But I do think it has the possibility of being really interesting.
Plus, there are other comparisons to be made. The Nvidia cards likely have a gaming performance advantage, but QuickSync is SO much better than NVENC that it might make big differences when it comes to encoding those remote desktop video streams.
@tylereyman5290 3 หลายเดือนก่อน ⁺³
I somehow managed to snag a p100 last month for $40 off ebay. That may have been the greatest deal i have ever scored.
@DMS3TV 3 หลายเดือนก่อน
My big takeaway from this is just how impressive integrated graphics are now. These dedicated GPU's were once the bee's knees and now a 7840U can outpace them in games. Really cool times we live in!
@DustinShort 3 หลายเดือนก่อน ⁺²
I was really surprised by the P4. I may have to try it as an energy efficient VDI solution. At two VMs per GPU it should be more than enough powerful for light CAD work, but I bet you could squeeze 4 VMs if you aren't working with large assemblies.
@m4nc1n1 3 หลายเดือนก่อน ⁺¹
I still have my 1080TI on a shelf. Used it for years!
@buddybleeyes 3 หลายเดือนก่อน ⁺¹
Lets goo! Love this cloud gaming series 😄
@gustersongusterson4120 3 หลายเดือนก่อน
Great video and I love the series! Though it would be a lot easier to visualize the data in bar graph form rather than just a matrix of values.
@sjukfan 3 หลายเดือนก่อน ⁺²
Hm... is there a x16 to x8/x8 splitter with externa power that can drive two 75W cards? Then you could run two P4s in a x16 😛
@thedeester100 3 หลายเดือนก่อน
Been using a Quadro P4000 gpu for over a year. Was half the price of a 1080ti on e-bay at the time. I dont game a great deal anymore but its never failed at anything Ive thrown at it.
@pkt1213 3 หลายเดือนก่อน ⁺²
I just put a P4 in my server. 4 transcodes were using ~25w. I may pick up a p40 or p100 if I want to run a AI locally.
@pkt1213 3 หลายเดือนก่อน
I also took the front plate off 5 sink and ziptied a 40mm noctua fan over the die. Haven't seen it much over 50C.
@gabrielramirezorihuela6935 หลายเดือนก่อน
The tiny Tesla is hilarious.
@michaelstowe3675 3 หลายเดือนก่อน ⁺²
Good choice on the beer! Local to me!
@CraftComputing 3 หลายเดือนก่อน
Quilters Irish Death is one of my top 20 beers. So good!!
@KeoniAzuara 3 หลายเดือนก่อน ⁺¹
Still rocking the M40 with 12Gb and the NZXT water cooling bracket
@criostasis 2 หลายเดือนก่อน
I designed and developed a RAG based LLM chatbot for my university using GPT4All, langchain and torchserve. Testing with my 16GB RTX 5000 laptop it performed on par with just my 13900K, producing answers with memory and chat context in about 40-50 seconds. On my server with an RTX 4080, it was blazing fast, answers came in about 5-10 seconds. Im sure a 4090 would be a bit faster but I didnt have one to test. Concurrency on a single gpu is where you can really hit a bottleneck. You have to setup a queue and locks but to handle it but with one gpu it gets slow. Thats why OpenAI and others have thousands and thousands of GPUs to handle to concurrent workloads. That and some magic code sauce I didnt get around to implementing in my time working on it before handing it off.
@calebgrefe8922 2 หลายเดือนก่อน
I get so excited thinking a itx gaming build with the p4 = )
@clintsuperhero หลายเดือนก่อน
I had seen titan XP's for cheap($180), i bought one out of childhood dreams. Though the performance ive gotten from it has been great over the 1080 i used to have, less stuttering in some games and overall better for 1440p like my two main screens are.
@Majesticwalker77 3 หลายเดือนก่อน
Thanks for keeping the info within your knowledge, I definitely appreciate it.
@jamb312 3 หลายเดือนก่อน
Have a couple of Quadro t400s for Plex and VM glad I got a P4 as it's been a powerhouse for running LLMs, recognize, etc.
By the way, Epyc 7302 is what I'm running, and I love it other than being a little heater.
Iron Horse Brewery has its main staples, like the quilter’s Irish Death, but they play with many others. I was up in their tap room last week, and the cookie death was only $3.
@Leetauren 3 หลายเดือนก่อน ⁺²
ai benchmarks for home labs are relevant. Please, include some.
@TheRogueBro 3 หลายเดือนก่อน ⁺³
Random power related question. If you were to run multiple P4's. If you turn off a VM does it "power down" the card?
@CraftComputing 3 หลายเดือนก่อน ⁺³
All of the GPUs have idle power draw, because they're still being used by the host. There is a host driver for monitoring and partitioning the GPU. The P40 and P100 were around 12-15W. The P4 was closer to 8-10W.
@KomradeMikhail 3 หลายเดือนก่อน ⁺³
I run into significant app crashes and issues when using an HBM2 graphics card through PCIe Passthrough to a VM.
Most noticably with KiCAD, and Deep Rock Galactic. They run fine on the same hardware bare-metal.
First encountered with a Radeon VII, then tested a Titan V to compare. Same results for team red and team green.
Tested on a Broadwell Xeon workstation, slimmed down from what Jeff runs in this video.
Anybody else have issues passing through HBM2 ?
@CraftComputing 3 หลายเดือนก่อน ⁺³
I've had no issues at all. I've done testing on the P100 and V100, and haven't had any problems.
@OliverKr4ft 3 หลายเดือนก่อน ⁺²
Have you stress tested the cards on bare metal? The memory type should not have any effect on stability when passed through
@novantha1 3 หลายเดือนก่อน ⁺¹
With regards to AI tests:
It might be an unexpectedly sagacious decision to avoid jumping into it at the moment. We're at what is simultaneously a crossroads and an wild west, and I can only see it getting more crazy.
In the simplest possible terms: Raw FP16 compute sort of doesn't lie. Given a sufficient quantity of it (and memory bandwidth to feed it), it's pretty straightforward to multiply two matrices. But there's a problem. TOPs.
Dedicated TOPs don't operate on the same principle as FP16 compute (And I'm giving companies' marketing divisions the credit of assuming they're talking about tensor operations when they talk about TOPs which is not always true), so it can be hard to draw an equivalence between, for instance, the FP16 compute of a pascal card and the tensor performance (which is often the majority of the AI performance) of a modern Nvidia GPU, for instance...To say nothing of extended instruction sets in the X86, ARM, or Risc V space (I would love to start a youtube channel talking about those at some point; a lot of people misunderstand CPU AI performance, including Ampere, and now Intel's Sierra Forest marketing department).
And then it gets even harder. Do you compare the memory access patterns or the raw performance? If you do the raw performance, a pascal GPU might hold up surprisingly well because in the end, FP16 and memory bandwidth will get you most of the way there. On the other hand, something like a CPU with VNNI extensions (Zen 4, and I think Intel's server P-cores, but not consumer) might actually perform more efficiently for its memory bandwidth in the sense that it can do lower precision AVX compute, and at a faster rate per unit of bandwidth thanks to fused instructions, but it might have a slower absolute rate of operation. Which one is better? Well, it depends on your use case.
Plus, all of this is ignoring more exotic things like Tenstorrent's lineup (very sexy), or things like Hailo M.2 accelerators (very accessible).
So when you add it all together...
At what precision do you evaluate? Some accelerators (notably CPUs, NPUs, and accelerators) will perform at an outsized rate on lower precision, particularly integer operations like int8. Common high performance AI models are not trained with those precisions in mind, so there is an accuracy loss at those precisions (And some of those losses only show up experientially, and not on standard benchmarks). Is it fair to compare an accelerator with block floating point 8 to the full FP16 of another accelerator?
How much customization is allowed to the pipeline? Is it fair to compare image generation on Nvidia and AMD using Automatic1111 webUI, when AMD is a second class citizen there? Do you compare Automatic1111 Nvidia to nodeshark AMD?
How do you compare an accelerator with more RAM at a slower speed to one that has a fast speed but little RAM? Some people favor accuracy/quality, while some people favor responsiveness, and some people have crazy workflows that depend on huge amounts of generations from models whose quality almost doesn't matter. In this case, the one accelerator would just be better because it can run the higher quality model, but that might not be what everyone wants.
Is the evaluation on training or inference?
If training, with which framework?
Are tensor cores used?
Do you use "pure" primitives like ResNet, or off the shelf production grade models and pipelines?
Do you measure at large batch sizes indicative of peak performance, similar to how we do CPU evaluations in gaming benchmarks, or do we test single user latency, which is reflective of end-user engagement with the product?
Do you focus on objective, timeless evaluation, such as by looking at peak performance (people would have had a really bad time buying hardware for AI if they bought before quantization and flash attention changed the game pretty drastically), or do you take into account the current state and usability of the hardware (people would have had a really bad time buying an unsupported AMD GPU assuming that "oh, AI's a big deal, it'll all get supported eventually").
Honestly, at the moment it's a bit of a mess, there's not really industry standards, every option you try to test at could have default settings or customizations which vary by hardware, making it potentially not fair, and there's just not a lot of collective industry wisdom on how to do it right.
To be honest, I'm not sure why I typed this out, I'm not sure this is going to be terribly useful for anyone, lol.
@CraftComputing 3 หลายเดือนก่อน ⁺¹
LOL, I read it. I've done some research and talked to a number of colleagues about AI performance testing, and you summed up a number of points nicely.
Every model is built a bit differently. Every GPU has their own strengths and weaknesses with its own hardware, configuration, available features, etc.
My 2¢, oftentimes, orgs that want to run a specific AI model will purchase the hardware that model was built for.
Me running generic benchmarks isn't really an accurate assessment of performance, as each model will take advantage of specific GPU architecture features. Like as you mentioned Int8, FP16, Tensor/RTX, etc performance in GPUs will wildly affect the speed of running a specific model, but that's really down to software selection of what you WANT to run, not the hardware you're running it on.
It's a chicken and the egg, but with LLM and GPU. You choose one and it decides the other.
@ICanDoThatToo2 3 หลายเดือนก่อน
I've been learning LLM on my R720, and found something interesting: My old 1050 Ti 4GB card runs AI about 3x faster than all 16 CPU cores together (2x E5-2667 v2 chips). While neither of those options are fast in absolute terms, and the RAM is very limiting, basically _any_ GPU is better for AI than CPU only.
@xmine08 3 หลายเดือนก่อน ⁺¹
LLMs are, in my opinion, becoming a huge thing in homelabs. For everyone? No, but then, many homelabbers have maybe two raspberry pi's and yet videos like yours exist where you have full blown real server hardware (Albeit old and thus affordable). I appreciate your honesty however that you don't want to produce numbers that you don't feel qualified for!
@daidaloscz 3 หลายเดือนก่อน
Would love to see how you set up sunshine+moonlight next. especially on headless systems, with no GPU output.
@hi-friaudioman 3 หลายเดือนก่อน
Oh baby, he's dropping the E5 v4's! we gaming now boys!
@ShooterQ 3 หลายเดือนก่อน
Just threw a unused Tesla P4 8GB into my Frigate NVR for video decoding. Thing shoots all the way up to 105C and crashes. Added an 80mm Arctic P8 with some custom ducting and have it working along at 71C constant now. Doing great for the $100 pricetag.
Dell Optiplex 3080 SFF, so it's the biggest I could fit and works well with the available power in that slim PSU.
@Yuriel1981 3 หลายเดือนก่อน
I think the main problem with switching to an Epyc platform is finding a board that can accommodate the 8 gpus. The best and most affordable option I see on eBay is a 7551p and an asrock EPYCD8 board with 4 PCIE 3.0x16 and 3 PCIE 3.0x8 slots. Since the last slot is a x16 you could* (if you can find a case big enough, or modify one) us a P100 or P40 that suffers the double VM afflection. But, the newer platform may make up some of the difference. Ad postings around 450, not sure if that actually includes cpu though, most similar full Epyc boards with cpu and various RAM combos can range from 500-850$. Might be more doable than you think.
@CraftComputing 3 หลายเดือนก่อน ⁺¹
There are a couple servers with a very similar design to the ESC4000 that accommodate either 4 or 8 GPUs. They're just insanely expensive.
@Seventeen76 3 หลายเดือนก่อน ⁺¹
Jeff is it possible to run dual 30 series Nvidia cards for stable diffusion machine learning?,
I am currently using a 3060 12 GB, and I am waiting on a 3080 10 GB to come in the mail, is it possible to run them at the same time? ( I know you can't combine them and run them as one) But somehow making them both work for my desired use?
Or is it just better to run it with the 3080 and just leave the 3060 for something else?
edit: I have a 5950x, and I'm using a x570 crosshair 8 extreme motherboard, 64 GB of g skills Trident 3600 mega transfers, with a seasonic platinum 1000 watt power supply.. in a cooler master cosmos c700m case.
@blendpinexus1416 3 หลายเดือนก่อน
got a 12gb 2060, am happy with it's performance and thought about getting tesla t4 gpus (the turing version of the p4) but the 12gb 3060 is also a runner up for that. similar efficiency too.
@MiG82au 3 หลายเดือนก่อน
Surely there's a mistake in the Fire Strike results? The P100 x2 and P4 physics and combined scores are higher than P40 and single VM P100.
@montecorbit8280 3 หลายเดือนก่อน ⁺¹
At 25:55
"....Better at 50 degrees Fahrenheit then 35 degrees Fahrenheit...."
I remember reading somewhere that the optimum temperature for beer to be served was 40 degrees Fahrenheit....anything colder and you will "freeze out" the flavor. That information comes from a time before "artesian brews" were a thing, though.
I take it this is no longer correct....or was it ever correct??
@CraftComputing 3 หลายเดือนก่อน ⁺¹
That's a very generic statement. Different flavors are better and worse depending on temperature. I enjoy IPAs starting at 35F, and letting them warm up to 50F while drinking, as you get a whole range of flavor and experience.
Stouts and other dark beers are typically much better starting at 45F and letting them warm even up to room temp.
Domestic Lagers and Pilsners, well, they're advertised ice cold because they're absolute garbage above 40F 😂
@montecorbit8280 3 หลายเดือนก่อน
@@CraftComputing
I have never particularly light beer, so I was curious. Thank you!!
@Sunlight91 3 หลายเดือนก่อน
From what I've heard machine learning is best at FP16 to half the memory requirements and speed up computation. Some even do it in INT8. This means old architectures are not recommend, particularly pre Turing.
@ccleorina 3 หลายเดือนก่อน ⁺¹
I've been waiting for P100 or P40 setup and guild vgpu. since I stil cant get it run with proxmox 7 or 8. Still wait for new vgpu guide.
@insu_na 3 หลายเดือนก่อน
What problems are you experiencing? I've been running proxmox with p100 vgpus for a year and p40 vgpus for months
@LinHolcomb 3 หลายเดือนก่อน
I still love to see tokens /sec running a few mainstream LLMs. I run 2 P40 in a AMD 5950 64Gb ram, truely the processor is not used to its potential. Going to send you a pickle beer.
@KHITTutorials 3 หลายเดือนก่อน
they have less cores, but the E5-2687Wv4 come quite close to desktop gaming chips. Most likely it will help with the "bottleneck", but will impact how many machines you can run. But would be interesting to see what improvements come from it
@al.waliiid 3 หลายเดือนก่อน ⁺¹
what about rendering and 3d and montage time and smoth and adobe premiere pro
@zr0dfx 3 หลายเดือนก่อน
I’d like to see an update on the we home server you did in that pre jonsbo style case! I made a very similar build but used LSI 9300i and 10gb m.2 adapter with TrueNAS scale (could not get pcie pass through to work either)
@win7best 3 หลายเดือนก่อน
as someone who has owned a P100 and still owns a P40(24GB) i can say that the P4ß has the better expiriance, also the P100 only has 16GB and i dont think that the HB2 memory will save it.
@carbongrip2108 3 หลายเดือนก่อน ⁺¹
How did a single Volta GPU perform when running 2x VM’s? We know you tested it 😉
@CraftComputing 3 หลายเดือนก่อน ⁺³
Volta coming shortly ;-)
@SoftwareRat 3 หลายเดือนก่อน
Old GeForce NOW instances used the Tesla P40 shared between two instances
@dualbeardedtech 3 หลายเดือนก่อน ⁺⁶
In regards to your commentary on benching AI... Well said my friend!
@AlexKidd4Fun 3 หลายเดือนก่อน ⁺³
I agree. Much respect for not presenting benchmarks for AI until you feel comfortable understanding what you're presenting. 👍👍
@mastermoarman 3 หลายเดือนก่อน
I wonder how well the three works with transcoding for plex/jellyfin and running project code ai for security camera image recognition
@matthewsan4594 3 หลายเดือนก่อน
as people may use the cards for other things like: video editing, conversion, and animation. could you please do that sort of testing as well??
@theroyalaustralian 3 หลายเดือนก่อน
The 1080Ti is THE GOOOOOOOAAATTTT... ITS THE GOOOOAAATTT.
@drakkon_sol 3 หลายเดือนก่อน
I have my P4 sitting in my PE-T110-II, as my decoder for Plex.
(My PE-T110-II is my NAS, plex, MC, BeamNG server. Total cost for this 32tb server: $200 CAD)
@logan_kes 3 หลายเดือนก่อน
I just got a pair of p40’s in last week and have begun benchmarking them on my Dell 14g servers running skylake and cascade lake xeons, I might throw them in my old 13g with broadwell xeons to see if the performance of scalable makes a noticeable jump for the massive price increase of the platform in a “cloud gaming” situation
@kenzieduckmoo 3 หลายเดือนก่อน
I support your new channel Cookie Computing
@CraftComputing 3 หลายเดือนก่อน
Today's show is brought to you by the letter "C"
@bobylapointe-l4r 3 หลายเดือนก่อน
I used a P4 o Proxmox for Ai VM. Just good to "build" your VM which is a very long journey. Getting all lib drivers and venv. Once done I quickly understood self hosted Ai is all about trial and errors and waiting for the P4 became painful. Also very very important: vGPU builds are ok for gaming but NOT for Ai. It's nearly impossible to get cuda working while using vGPU. At least not with these homemade setups. Without saying Ai is all about vram and vgpu vram split as direct dramatic impact. I ended up removing all vGPU setup and sticked to PCIe passthrough, that was the only way to have multi-purpose home server for both gaming VM and Ai VM.
@VinnyG919 3 หลายเดือนก่อน
exllama runs fine on vgpu here less than 10% overhead loss
@forsaken1776 3 หลายเดือนก่อน
I've watched many of these types of videos not to mention most of your other vids. what i'm not sure about is how your vm's are set up. Are your VMs just a vm of windows with the game(s) installed or is there a way to directly install the game in a vm without the overhead of windows or linux OS?
@cyklondx 3 หลายเดือนก่อน
You should disable ecc vram on either of those cards; on p100 with ecc enabled it suffers some 30% of performance.
@JoshWolabaugh 3 หลายเดือนก่อน
I might have to drop a P4 in my dell r720 and give it a go. thanks jeff
@MrMaker-w1r 3 หลายเดือนก่อน ⁺⁴
Feed me Seymore, Feed me.
@FaithyJo 3 หลายเดือนก่อน
Feed me all night looooong!
@k9man163 3 หลายเดือนก่อน
Would you be intrested in testing these cards for local LLM preformance? Im curious what impact the HBM2 memory will have over the DDR5.
@OMGPOKEMON47 3 หลายเดือนก่อน
Was the P4 tested at PCIE x16 or x8? If I understand correctly, using 8 of the P4s (vs. 4 double slot cards) on this server would result in each slot running at x8 bandwidth.Not sure if that would make a difference in gaming performance 🤷‍♂
@RedneckResto 3 หลายเดือนก่อน ⁺¹
8 P4 FTW
@robe_p3857 3 หลายเดือนก่อน
Looking forward to AI benchmarks. Trying to decide whether to be creative or just grab a 5090.
@aaronburns2858 3 หลายเดือนก่อน
Do you think lga3647 machines are relevant? I just ended up with a supermicro x10spm-tf and a Xeon gold 6232. I got it dirt cheap and curious if you think it’d be good enough to run a couple machines for the kids to play Minecraft and me to run a few other games(fallout,cyberpunk, hogwarts legacy. And mostly old titles)
@blehbop4268 3 หลายเดือนก่อน
Would you be able to test your store of GPUs, both gaming and professional, with BOINC GPU tasks with power consumption and production in mind?
@spicyandoriginal280 3 หลายเดือนก่อน
I know that you can’t test everything but I would love to know if 5C/10T makes a noticeable improvement? It opens up the possibility of a 6 x P4 system with dual 16 core Xeon (2.6 GHz Base Clock).
@StevenWilliams-lb9tf 2 หลายเดือนก่อน
Jeff, have you tried the rtx 4000, ive thought of getting one as it claims to be close to an rtx 2080 mobile on quadro wiki, but tech powerup claims its more a rx 6600. im thinking, do i save for the rtx 4000 or just get a p100. Single slot at 160w vs half the price at 250w thanks
@pidojaspdpaidipashdisao572 3 หลายเดือนก่อน
I always had only one question for you, why do you drink beers (or whatever that is) from a glass? Why not the bottle or the can in this case? I feel like a less of a man when I drink it out of the glass.
@CraftComputing 3 หลายเดือนก่อน
Glossing over the strange identity crisis you seem to be having, a glass let's you smell the beer far better than a can or bottle. Secondly, pouring a beer with a head brings out more flavors. A nucleated glass also helps refresh the head, making your beer more enjoyable longer.
As for your latter comment, I think it's queer to let other's opinions of you define your identity. Next time you're at a bar, order that Cosmo you've always wanted.
@pidojaspdpaidipashdisao572 3 หลายเดือนก่อน
@@CraftComputing Making a science of an orange juice that you drink, mfw. Nobody defines me, we all know who drinks out of a glass. What is Cosmo?
@CraftComputing 3 หลายเดือนก่อน
Who drinks out of a glass?
@rklauco 3 หลายเดือนก่อน
Maybe stupid question - when you calculated the price, did you include the license of Windows into it? I am not sure if my information is correct, but I thought you need special (and quite expensive) Windows 11 license to run it in VM. But it's possible I am wrong and there is some option to get it without the $100+ license...
@CraftComputing 3 หลายเดือนก่อน
When I'm running tests like this, I often run Windows without a license key. No sense purchasing a Windows license for a VM that won't exist in two months. For long-term deployment, grab an OEM license key. They're possible to snag for $10-15.
@rklauco 3 หลายเดือนก่อน
@@CraftComputing I thought these OEM keys are not in line with MS licensing and their license (while technically working) is not allowing you to virtualize the machine and should only run on bare metal. But again, not windows licensing expert.
@AwSomeNESSS 3 หลายเดือนก่อน ⁺¹
Now I’m wondering how these run on Chinese X99 with Turbo Boost Unlock on Xeon V3 CPUs. E.g., a 2699 V3 runs at 3.2-3.4ghz TBU full load, 18c/36t would run at 4c/8t x4 equivalent machines with 2C/4t to spare for the bare metal OS, with 128GB = 28GB for the VMs and 16GB for the bare metal. Could have the base system up and running for ~$450ish + the cost of GPUs.
@CraftComputing 3 หลายเดือนก่อน ⁺¹
th-cam.com/video/ngW_FI4PPZk/w-d-xo.html
@AwSomeNESSS 3 หลายเดือนก่อน ⁺¹
@@CraftComputing Man, that’s quite a throwback! Peak of when you were reviewing Chinese parts every few videos.
Hopefully Turing comes down in price in the next couple of years, would be interesting doing a revisit with more GPU grunt down the road. Top-end Tesla/Quadro Turing is still ~$2000CAD.
@CraftComputing 3 หลายเดือนก่อน ⁺¹
No idea why Turing GPUs are still so expensive. You can snag an A5000 for less than $1200 for 2x the performance of an RTX 6000.
@AwSomeNESSS 3 หลายเดือนก่อน ⁺¹
@@CraftComputing that is weird. Must be connected to contract pricing or the like (e.g., not enough Turing supply has hit the market yet). Probably can expect it to bottom out as more companies move to Lovelace/Hopper/Blackwell. A single A5000 + Chinese X99 V3 setup would be an interesting proposition for an all-in-one server: 2 8c/16T VMs with 1/2 an A5000 + a 2C/4T promox server. Add-in a cheap A310 for Plex and you’d have a decent home lab setup started.
@CraftComputing 3 หลายเดือนก่อน ⁺³
I've got a pair of A5000s, and you'll be seeing them shortly here on the channel ;-)
@Agent_Clark 3 หลายเดือนก่อน
Where and how might I get more information on a server like this. I'm interested is building one but only have experience with mostly consumer hardware.
@mrsrhardy 2 หลายเดือนก่อน
The cards dont have video-out, so you need onboard GPU (say intels) so how do you get windowsOS 10/11 to use the gpu for gaming (assuming steam)? I know intelsQsync is good but in apps like DaVinci Resolve can the nVidia-GPU alternative be selected? I ask becuase I know you use these often in VM enviroments and do passthrough for HW/GPU support so obviously its slectable from a Lev1 Hypervisor but what about plebs like us mere mortals with a ssf-desktop with intergrated intel graphics, is the P4 a nice affordable boost or more trouble than its worth?
@Seventeen76 3 หลายเดือนก่อน
Is a used second gen threadripper, good for machine learning, ai? I was considering hooking up a system with one, or trying to get an epyc Cpu. Are those CPUs any better than just regular ryzen, for that intended purposes?
@cgrosbeck 2 หลายเดือนก่อน
Do you have a how to setup with your hardware. Specifically OP system drivers network to terminals like raspberry pies
@haylspa 3 หลายเดือนก่อน ⁺¹
can you put tesla p40's or P10's in SLI with a Titan XP or X ??? this is a question I have because I am building a Godlike MSI x99 platform
@CraftComputing 3 หลายเดือนก่อน ⁺¹
No
@haylspa 3 หลายเดือนก่อน
@@CraftComputing Thank you! have a blessed day!!
@VinnyG919 3 หลายเดือนก่อน
you may be able with different sli auto
@HPTRUE 15 วันที่ผ่านมา
Great video!
@ewenchan1239 3 หลายเดือนก่อน
There isn't a standard way of benchmarking GPUs for AI that's meaningful for homelabbers.
You can run the HumanEval benchmark for example, but the score is practically meaningless (as it is use moreso for benchmarking the MODELS rather than the hardware that said model runs on).
@ronaldvanSluijs 3 หลายเดือนก่อน
I have a Dell r730 with a recently bought GRID K2 card in it and have been struggling for ever with it. I have it recognized in proxmox and on my windows server 2019 vm. It is also showing up in plex as a transcoder option. But some how plex does not seem to wanna use the video card and transcodes with the CPU instead. I see you have allot of experience with this, did you find a solution to this with your previous build?
@DanielPersson 3 หลายเดือนก่อน
I have benchmarks for newer cards. If you want to Collab on a video about AI inference or training I could help out.
@playeronthebeat 3 หลายเดือนก่อน
Will you do one more video for Turing/Volta cards (essentiall 20xx Series), too, or are those still out of reach (budget wise etc)?
Would be interesting to me if they're not too expensive.
@CraftComputing 3 หลายเดือนก่อน ⁺²
Yep! I've got some V100 and A5000 GPUs lined up. Not sure if I'll cover Turing, as those are prohibitively expensive still.
@playeronthebeat 3 หลายเดือนก่อน
@CraftComputing ah. That's unfortunate.
Would still love to see it, honestly. The V100 doesn't seem too expensive on their own. Still, they'd definitely strecht the budget quite a bit going for ~€700€ here for the 16GB SXM2 and roughly 1k more for the 32GB SXM3.
For someone like me toying with the idea of having at max one or two systems on there, it'd be quite cool. But eight systems (4 GPUs) could be a bit harsh regarding the price.
@PCsandEVs 3 หลายเดือนก่อน ⁺¹
Love your work Jeff thanks!
@mrsittingmongoose 3 หลายเดือนก่อน
Is the stuttering in every single game just the video? Or are they actually that stuttery?
@frankenstein3163 3 หลายเดือนก่อน
Littel off subject. How do you send the cloud gaming around 200 ft ?
@jasontechlord 3 หลายเดือนก่อน
Looking at possible AMD solutions and it seems all those cards have just enough power to render the crickets of the AMD server card market.
@CraftComputing 3 หลายเดือนก่อน ⁺¹
I've got some AMD GPUs, and will be covering them in an upcoming video as well.
@masoudakbarzadeh8393 หลายเดือนก่อน
I buy tesla k80 , and rx580 , can i use the same time power?
@spotopolis 3 หลายเดือนก่อน
With how old the P4 is at this point, how would an Intel Arc A310 stack up to it? Its half the VRAM, but its clock speeds are double that of the P4. Do you think the lower powered card with newer architecture would have a chance?
@CraftComputing 3 หลายเดือนก่อน
Oof... The A310 and A380 don't hold up well for rasterization performance. They absolutely win when it comes to video encode/decode though. Depending on your needs, they're a solid option.
@TheAnoniemo 3 หลายเดือนก่อน
How were the temperatures on the P4? I know they have some very specific airflow requirements due to the small restrictive heatsink.
@CraftComputing 3 หลายเดือนก่อน
This server is specifically designed for passive GPUs. The P4 ran at ~45C. The P40 and P100 ran between 55-62C.
@TheAnoniemo 3 หลายเดือนก่อน
@@CraftComputing thanks for the reply, I was wondering because I know we had some issues at work when installing a single T4 and no other expansion cards. The perforated back of the chassis provided too little restriction so all the air just went around the T4 instead of being forced through. It would subsequently throttle like crazy...
@Adam130694 2 หลายเดือนก่อน
Just put in there two 2696v3 (for $50-70 a piece), unlock them and have ~3.6GHz clocked CPUs, with the same 72 threads?
@CraftComputing 2 หลายเดือนก่อน
The unlock is still power limited. Under full load, the CPUs would still likely struggle to hit 2.8GHz or higher.
@Adam130694 2 หลายเดือนก่อน
@@CraftComputing I’ve seen them hitting 3.5-3.6 in games quite frequently… but you being someone with higher experience I believe you tested that. Good job anyways and as always!
@adamtoth9114 3 หลายเดือนก่อน
Send me those cards and I'll give you some step/sec results in tensorflow training 😃.
Same dataset, multiple runs with different batch sizes for each card. I used a K80 lately for this and I have a well established test environment in docker for it.
@cyklondx 3 หลายเดือนก่อน
Hi, disable ecc memory on P100
@0mnislash79 3 หลายเดือนก่อน
No fighting game test to also see the input lag with 2 VM 😟
@AlexTheStampede 3 หลายเดือนก่อน
Good on you for not doing the AI tests if you don’t feel comfortable with your knowledge on the subject! I’ve seen Whisper (normal model, I think it was?)running faster than “real time” on an Intel N95 cpu, just like I’ve seen people define “unusable” gpus that generate an image in 10 seconds. That said, while my experience is now several patches old… Starfield EATS cpu. I know my Ryzen 5 3600 is actually a bottleneck for the RTX 3060 while running Starfield, hardly improving frame rate as resolution and quality drops. Hilariously (or sadly?) that system also runs Baldur’s Gate 3 less smoothly than my M2 Mac Mini with just 8gb of system ram, showing another game that devours cpu. Again, pointing out that the above is several patches ago, things might have changed.
@CraftComputing 3 หลายเดือนก่อน ⁺¹
Starfield does hate CPUs, but the 1080 Ti test I referenced was running on a 7800X3D, and the Pascal card still was awful by comparison to newer cards.
@michaelwillman5342 3 หลายเดือนก่อน
You dont account for the x8 vs x16 slot for the P4s, you need to run that card at 8x if you have 8 of them but it could bottleneck it (or not?)
@CraftComputing 3 หลายเดือนก่อน
I was running the P4 on an x8 slot. And trust me, there's more than enough bandwidth on an x8 for that GPU.
@michaelwillman5342 3 หลายเดือนก่อน
@@CraftComputing 1 in 1 slot is not the same as 8 in 8 slots.
@CraftComputing 3 หลายเดือนก่อน
The (4) x8 slots on each side of the server are directly wired to each of the two CPUs. It's not a shared bus or PLX-Split lanes. Every slot is dedicated x8.
So yes, there is plenty of bandwidth.
@protator 3 หลายเดือนก่อน
@@michaelwillman5342 That server has 80! pcie lanes from the cpus alone, plus chipset. Where do you see the risk for a bottleneck in this setup?
With server and workstation class cpus you don't have to worry about bandwidth limitations like on gaming/consumer platforms where they spread 16 or 20 lanes over the entire board via bridge chips.
I run a similar setup with two E5-2696v3s, with one card running full x16 and 6 accelerators at x8. Makes no difference whether I put load on a single component or decide to go full bore and draw 1200W ... as long as the chosen cpus can keep up in terms of performance per core/thread, such a setup works fine.
@pachete. 3 หลายเดือนก่อน
It's cool, but I don't have the need to get a tesla gpu. I think my 4650g server is enough for me.
@claucmgpcstuf5103 3 หลายเดือนก่อน ⁺¹
WELTHA WAS VERI EDUCATIV ... but wood be cool a vs a rtx 3080 can it do 60fps in 4 vm . theat wood be intresting yeas !
@CraftComputing 3 หลายเดือนก่อน ⁺¹
Unfortunately, Ampere and Ada consumer cards can't be unlocked for Nvidia GRID support. But there is some home with Server 2025... stay tuned.
@claucmgpcstuf5103 3 หลายเดือนก่อน
@@CraftComputing ther haz to be a wey !! i rembre this from linus ... th-cam.com/video/LXOaCkbt4lI/w-d-xo.htmlsi=LyJskfm3EQ9IkqZQ ... he did it on crisis 3 and AMD !! SEGE YEAAA . i get wat ar you tring to do ... cash i cas tolty ..but traing eviting is tha so in you cese tha 75 w t4 shood be tha logalca step if you wwat to continiu wiht this caz well to do 100 vm slot fro wat 10$ per moht gen ewn fool 1000$ per month gen in 12 mont upgrede ! caz wel fro just 8vms or ive 16 lwe tha si tha tha wood mine get tha inestamta in 24 mothn gen witah 16 vm slot gen maxi tah sistem ..it is a investem tolti you have to think on it t get sam profint you havt to scalet up gen from 8 to 16 to 1000 to 10000 max fin space sation gen :)!! ( adn than seve you cas until nextime !! ) .. thos weil waf ok in 60 fps flunet on all ... withtha ap etc ... at slit atlist 1080p medim ... or just tes to doom 2016 tha most ok otimizd tolti on ultra osam !! ! it weil be a end o a era .wan tha dezaster rtx rey trecing weil relei get adotedt in 20 yars orev all !!

ต่อไป

เล่นอัตโนมัติ

How fast is an RTX A5000? - Cloud Gaming Server Pt.22