Ai Server Hardware Tips, Tricks and Takeaways

แชร์
ฝัง
  • เผยแพร่เมื่อ 31 ม.ค. 2025

ความคิดเห็น • 149

  • @DigitalSpaceport
    @DigitalSpaceport  2 หลายเดือนก่อน +4

    Writeup - digitalspaceport.com/homelab-ai-server-rig-tips-tricks-gotchas-and-takeaways/

    • @d3vr4ndom
      @d3vr4ndom หลายเดือนก่อน

      @DigitalSpaceport I’ve watched your videos and looked at your website. I appreciate your content, but i find it very incomplete. I would appreciate it more if you could answer my questions here. You can also update your website too with the info. It would really help me and the community understand how to invest in hardware instead of making expensive mistakes. I understand it may be too much to ask of you since your not an employee of mine but your kindness in pointing me to the right resource or help since you have the hardware would be greatly appreciated and needed by the community.
      Background:
      I’m willing to spend 25k on a A.I setup.
      This setup will be used to run LLMs ex. Ollama, in addition to Flux 1 Image generator, and a video generation model yet to be decided.
      My plan is to run CPT (Continuous Pre-Training) which just means training a model that already exists to remove basis and empathize what matters for my use cases. This training will be done on both LLM and image generation models with video models in the future.
      Inference will also need to be done on all these models and I will be aiming to ensure I get the most T/s (tokens per second), but also the best training times as this is what will be consuming the greatest amount of time as stated in the Ollama white-paper the 70b model took 2048 H100s 21 days of 24/7 training to finish. With cloud pricing that’s a 2 million dollar model.
      In your video you never test CPT, generating LoRAs (Low Rank Adaptations), nor imaging / video inference. The hardware requirement difference between doing CPT, LoRAs, inference of each model type will be different than just text Ollama inference. This leaves a large gap in the different types of hardware utilization that may occur under different workloads and conditions especially when dealing with other type of models other than text models.
      My Questions:
      How does CPU affect training CPT, and LoRA generation in comparison to inference? Again not just text models but image and video models as well.. (Cores, GHz, Cache)
      How does RAM affect training CPT, and LoRA generation in comparison to inference? Again Again not just text models but image and video models as well.. I know you said speed of RAM doesn’t matter but only in text inference did I see that tested. I would like to see the following (# of Channels, capacity, and speed)
      How does GPUs affect training CPT, and LoRA generation in comparison to inference? Again not just text models but image and video models as well.. I really want to understand in what work loads do you need a 16x connection on each GPU, what’s the performance difference? How does scaling perform across multiple GPUs? For example does 2 x 3090 perform twice as good in training as 1 x 3090? If not what’s the performance decrease as you scale from 1 - 8? Can you get away with only using a 8x or 4x bandwidth PCIe, if not with gen4 maybe gen5, what’s the performance decrease? Perhaps it’s just a performance hit in loading the model?
      As for storage did you see any off loading of the SafeTensor file or checkpoints to be overwhelming for your drive? Whats the GB/s needed during training to not bottleneck your GPUs dumping data into your SSD?
      I have a few more questions but maybe I’m asking too much already eh?
      If you see this then it would be much appreciated by myself and hopefully helpful for others to see your responses.
      Cheers。

  • @MeidanYona
    @MeidanYona 2 หลายเดือนก่อน +4

    This is very helpful! I buy most of my hardeare from facebook marketplace and i often have to wait long spans between getting components so knowing what to watch out is very important.
    Thanks a lot for this!

    • @minedustry
      @minedustry หลายเดือนก่อน +1

      I also play a long game, acquiring hardware as well as premeditated upgrade paths. My old daily pc became the home theater pc. My current daily pc will become the game server, and I guess my next pc becomes an AI llm machine.

  • @anthonyperks2201
    @anthonyperks2201 23 วันที่ผ่านมา +1

    I'm loving these well-researched videos. My partner is a mental health professional, so LLM queries with personal data on public systems is not an option. I'm working out whether I'll develop, on my next build, one of these systems so that she can use the tools safely and privately. I've run linux workstation setups before, and I have been using VMWare for more than ten years, but it's looking like proxmox is the preferred option. It seems like AI finally got virtualization tools to do the GPU pass-throughs. That has been notoriously bad for gaming for some time. Hmmm... Appreciate all of the information.

  • @LucasAlves-bs7pf
    @LucasAlves-bs7pf 2 หลายเดือนก่อน +1

    Great video! The most eye-opening takeaway: having two GPUs doesn’t mean double the speed.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +3

      Hands down #1 question in videos. Not with llama.cpp yet but hopefully soon. Bigger models and running models on seperate gpus at the same time are the current reasons and running bigger models like nemotron is a big quality step. Or use vLLM which isnt as end user friendly as ollama/owui

    • @gaiustacitus4242
      @gaiustacitus4242 หลายเดือนก่อน

      Why would this be eye-opening? Of course, having multiple GPUs does not result in linear scaling. You can't get close to linear scaling on any system where multiple chips share the processing, even on the same die. When it comes to GPUs like the nVidia RTX series there is the latency of the computer's bus that will slow data transfer.

    • @geelws8880
      @geelws8880 หลายเดือนก่อน

      Well it depends on the architecture of your AI. You could have a modular approach.

  • @UnkyjoesPlayhouse
    @UnkyjoesPlayhouse 2 หลายเดือนก่อน +10

    dude, what is up with your camera, feels like I am drunk or on a boat :) another great video :)

  • @danielstrzelczyk4177
    @danielstrzelczyk4177 2 หลายเดือนก่อน +1

    You inspired me to experiment with own AI server based on 3090/4090. I did little different choices like: ASRock WRX80D8-2T + Threadripper Pro 3945wx. As you mentioned CPU clock speed matters and I got a brand new motherboard + CPU for around 900 USD. I also want to try OCulink ports (ASRock has 2 of them) instead of risers There are 2 advantages: OCulink offers flexible cabling and works on separate power supply so you are no longer dependent on a single expensive PSU. So far I see 2 problems: Intel X710 10gbe ports cause some errors under ubuntu 24.04 and Noctua NH-U14S is too big to close a Lian Lin 011 XL so I have to turn to an open air case. Can't wait to see your future projects.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      On the intel, if thats fiber x710, do you have approved optics?

    • @MetaTaco317
      @MetaTaco317 2 หลายเดือนก่อน

      @@danielstrzelczyk4177 I've been wondering if OCuLink would find it's way into these types of builds. Wasn't aware ASRock mobo had 2 ports like that. Have to check that out.

    • @danielstrzelczyk4177
      @danielstrzelczyk4177 2 หลายเดือนก่อน +1

      ​@@DigitalSpaceport I use x710 copper connection but I just figured out that I shouldn't have blamed intel NICs for repeating "Activation of network connection failed". The source of my issue was virtual USB Ethernet (American Megatrends Virtual Ethernet) created for IPMI. This is strange as I use separate ethernet cable connection for IPMI but after disabling "Connect automatically" in USB Ethernet profile it all returned to normal.

    • @danielstrzelczyk4177
      @danielstrzelczyk4177 2 หลายเดือนก่อน +1

      ​@@MetaTaco317 Yes there are 2 ports out of the box but I have seen also PCIe cards with OCuLinks and converter from M.2 slot to OCuLink so there are couple of options. ADT Link F9G just arrived so let's see how it works :)

  • @christender3614
    @christender3614 2 หลายเดือนก่อน +2

    Been waiting for that one and happy to write the first comment!

  • @viniciusmoura9105
    @viniciusmoura9105 25 วันที่ผ่านมา +1

    Thank you for the insights. I currently have 2 x RTX 3090 NVLinked and I’m interested in start experimenting with local LMMs. Do these models benefit from memory pooling anyhow? If so, would you know what model sizes I could run with such a rig?
    Both GPUs are running at 8x/8x PCIe 5.0 on a system with Intel 13700K and 128GB DDR5 @4800mhz.
    Thanks in advance for your help.

  • @SaveTheBiosphere
    @SaveTheBiosphere 2 หลายเดือนก่อน +2

    The 4060 Ti w 16GB can be had New for 449. Built for x8 on PCIe, so perfect for bifurcation off an x16. Memory bus is slower/128 but for AI use seems like hands down the best bang for the buck card? 165 watts Max draw. (PNY brand on Amazon in stock.)

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      Great comment! You got me thinking this morning and I wrote out a detailed response but for a dual to quad setup, these are a compelling route. th-cam.com/users/postUgkx60ENLNTIkOA49lFHXJIhr4yNdYHH2gib?si=BcSKVmujjfdPigly

  • @doesthingswithcomputers
    @doesthingswithcomputers 21 วันที่ผ่านมา

    For Intel Python Extension for Tensorflow, Pytorch, Radeon Open Compute (ROCm) and Compute Unified Device Architecture (CUDA) carefully research you use case and map your device you want to use before you buy, sometimes a lower CUDA compute capability can work with your task but it is dependent on what you are trying to do.
    FYI I am going to mimic your setup before August, hopefully with Quad Arc770s.

  • @dorinxtg
    @dorinxtg 2 หลายเดือนก่อน +11

    I didn't understand why you didn't mention any of the Radeon 7xxx cards, nor ROCm

    • @ringpolitiet
      @ringpolitiet 2 หลายเดือนก่อน +14

      You want CUDA for this.

    • @christender3614
      @christender3614 2 หลายเดือนก่อน

      It’s preferable . AFAIK, Olllama isn’t yet optimized to work with ROCm. Would’ve been interesting though like “how far do you get with AMD”. AMD is so much more affordable per GB. Especially when you look at used stuff. Maybe that’s something for a future video, @DigitalSpaceport ?

    • @christender3614
      @christender3614 2 หลายเดือนก่อน +3

      My comment vanished. Could you make a video on AMD GPU? Some people say they aren’t that bad for AI.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +18

      I see two comments here and do plan to test AMD and Intel soon.

    • @slowskis
      @slowskis 2 หลายเดือนก่อน +3

      ​@@DigitalSpaceport I have a bunch of A770 16tgb cards along with Asrock H510 BTC Pro+ motherboards sitting around. Was thinking of trying to make a 12 card cluster connect by 10gb network cards and 10900k for the cpu with the 3 linked to each other. Any problems you can think of that I am missing? 4 gpu per motherboard with 2 10 gb cards. The biggest problem I can think of would be the single 32gb ram stick that the cpu is using.

  • @coffeewmike
    @coffeewmike 2 หลายเดือนก่อน +1

    I am doing a build that is about 60% aligned with yours. Total investment to date is $7200. My suggestion if you have a commercial use goal is to invest in the server grade parts.

  • @jack6539
    @jack6539 29 วันที่ผ่านมา

    Excellent video. Subbed. I am rocking 2 x 1070fes on debian. Do I need to setup sli to double the vram or can it be done with llama.cpp?

  • @StefRush
    @StefRush 2 หลายเดือนก่อน

    My AI LAB I build to test and was shocked how fast it was.
    4 x Intel(R) Core(TM) i5-2400 CPU @ 3.10GHz (1 Socket)
    RAM usage 71.93% (11.22 GiB of 15.60 GiB) DDR3
    proxmox-ve: 8.3.0 (running kernel: 6.8.12-4-pve)
    NVIDIA geForce GTX 960 PCIe GEN 1@16x 4Gi
    write python code to access this LLM
    response_token/s:24.43
    create the snake game to run in python
    response_token/s:21.38
    You did a great job with your tutorials Thanks I'm going to get some 3060s now.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      Thanks those 12GB 3060s are costwise probably top 3 imo for VRAM/$

  • @squoblat
    @squoblat 2 หลายเดือนก่อน +1

    At what point does an A100 80gb become viable? They are starting to drop in price now. Mulling over either an RTX 6000 Ada or waiting a bit longer and going for an A100 80gb. Currently running 2x RTX A5000, the lack of single card VRAM is much more of a limit than many of the online world seem to point out.

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน +1

      Honey we are getting an A100 cluster doesnt pass my gut check on things I can casually drop at dinner. Not yet. Im on ebay looking these up however now and your right, prices on them are down.

    • @squoblat
      @squoblat หลายเดือนก่อน

      @DigitalSpaceport looking forward to the video if you ever do get one. Having a compute card makes a lot more sense as models get bigger. The self-hosted scene is going to get pushed out if things like the a100 stay very expensive.

    • @gaiustacitus4242
      @gaiustacitus4242 หลายเดือนก่อน

      You've hit the nail on the head. V-RAM is the most important factor. Unless you are looking to run at most a 7B parameter LLM (which IMO is pointless) then an nVidia RTX based system will yield disappointing performance. Even an M3 MAX with 128GB RAM will perform better on models requiring more than 24GB RAM, and the M4 MAX offers still better performance.

  • @tringuyen0992
    @tringuyen0992 หลายเดือนก่อน

    Hi, great video.How about the Nvidia A6000?

  • @TheColonelJJ
    @TheColonelJJ หลายเดือนก่อน

    With the end of SLI|NVLINK, is there hope that the home PC build will run two GPUs with Windows? I want to add a second 3060 12G to my Z790 i-7 14900k, and use it for Stable Diffusion/Flux while also running a LLM along side for prompting. Am I forced to move to LINUX?

  • @lucianoruiz2057
    @lucianoruiz2057 2 หลายเดือนก่อน

    I loved this video !. Very helpful information !! I have a question: What is the difference in performance between using a pcie gen 3 x16 vs a pcie gen 4 x16 . I have a few 3090s and also some dell t7810 with dual 16x but pcie gen 3 :(

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      Thanks. for inference speed, slot gen and width dont have impact that is meaningful at all. Thise t7810s are G2G!

    • @lucianoruiz2057
      @lucianoruiz2057 2 หลายเดือนก่อน

      @@DigitalSpaceport Nice! What about training with x16 pcie 3.0 vs x16 pcie 4.0 ? Did you try ?

  • @SaveTheBiosphere
    @SaveTheBiosphere 2 หลายเดือนก่อน

    What are your thoughts on the AMD Strix Halo releasing in January at CES? It's an APU with 16 Zen 5 cores and an NPU and a GPU all on chip and on chip ddr6 64GB. Also called the Ryzen AI Max+ 395. Targeting AI workstations, the January release version to have 64 GB on board ddr6 that can be allocated to AI models (OS would need some of the 64). Q3 2025 128 GB version.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      AMD doesnt lack for the hardware. Its always the software/drivers/kern lacking. Its been better but nvidia is the sure thing. Hopefully they have some good work going into the sw side for this! I hope to test one out eventually

  • @xarisathos
    @xarisathos 12 วันที่ผ่านมา

    Hi there! I watched the entire video but I was looking to find out what to look for a second-hand modern HP workstation, from which I would like to extract the CPU (Zen 4 Threadripper Pro) and then use it on an entirely new build. Is it possible or the CPUs from OEM are locked somehow?
    Thank you

  • @DanBYoungOldMiner
    @DanBYoungOldMiner หลายเดือนก่อน

    I had a Gigabyte x399 Designare Ex 1.0 with Threadripper 1900x and 128gb 3200mhz memory just laying around and not doing much. Was able to get 6 - Asus TUF RX 6800xt's into a MODDED RSV-L4500U 4U SERVER CASE. ALL GPU's at PCIE 3.0 X8. I will next install ROCm from AMD as they have a very large catalog. Would be nice to have some content on AMD RX6000 and RX7000 GPU's as AMD GPU's are very capable, but there is not much content out there.

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      Let me know how it goes Im interested in getting AMD GPU content on the channel.

  • @FahtihiGhazzali
    @FahtihiGhazzali 2 หลายเดือนก่อน

    i love this video. so much i've learned in such a little time.
    question 1: blah blah blah
    answer: no, vram is more important
    question 2: blah blah blah
    answer: no, vram!
    other questions:
    no, vram! 😊

  • @keoghanwhimsically2268
    @keoghanwhimsically2268 2 หลายเดือนก่อน

    How does a 4x3090 compare to 2xA6000 for training/inferencing for different model sizes?
    (A6000 is more like 3090 Ti in terms of CUDA but with 48GB VRAM, though with a lower power draw for a single card than even the 3090 since it’s optimized for pro workloads and not gaming. Downside: it’s 2x the cost per GB of VRAM compared to 3090.)

  • @Boyracer73
    @Boyracer73 2 หลายเดือนก่อน

    This is relevant to my interests 🤔

  • @marvinthielk
    @marvinthielk หลายเดือนก่อน

    If I want to get into hosting some 30B models. Would 2x 3060s work or would you recommend a 3090?

  • @TheYoutubes-f1s
    @TheYoutubes-f1s 2 หลายเดือนก่อน

    Have you seen any inference benefit to using CPUs with a larger L3 cache? Some of the EPYC Milan CPUs have 768 MB of L3 cache. I wonder if it has an effect when the model can't fully fit in VRAM.

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      I didnt in testing on the 7995wx which may be a video your interested th-cam.com/video/qfqHAAjdTzk/w-d-xo.html

  • @hassanullah1997
    @hassanullah1997 2 หลายเดือนก่อน +1

    Any advise on potential local server for a small startup looking to support 50-100 concurrent users doing basic inference/embeddings with small-medium sized models - 13B for eg.
    Would a single RTX 3090 suffice for this?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +2

      This is my guess, so don't hold me to it. I would start with figuring out exactly which model or models you want to run concurrently. You would want to set the timeout to those to be pretty long to avoid something like people coming back and all warming it up at the same time, so greater than 1 hour. I think you would be better off with 3 3060 12GBs if it would support the models that you intend to use. If you are looking for any flexibility, then start with a good base system and add 3090s as needed is the safest advice. If there is a big impact from undersizing, just go 3090s. Make sure to get a CPU that has a good fast single-thread speed. Adjust your batch size around as needed but the frequency of your users' interactions need to be observed in NVTOP of other more LLM specific specialized performance monitoring tools.

  • @TheYoutubes-f1s
    @TheYoutubes-f1s 2 หลายเดือนก่อน

    Nice video! What do you think of the Asrock Romed82t motherboard?

  • @ThanadeeHong
    @ThanadeeHong 2 หลายเดือนก่อน +1

    I setup motherboard and epyc cpu just like you.
    May i ask if you can do it all over again, will you change any setup?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      Im wanting to get a 7f72 but they are expensive and I would need a pair. If i was scratch building I would likely have used an air cooler for cpu also. Maybe the h12ssl-i would be the board id go with since the mz32-ar0 has gone up in price a good bit.

  • @timstevens3361
    @timstevens3361 หลายเดือนก่อน

    i need to see a video on
    how to buy new affordable parts
    and build a 2 gpu computer for a.i.
    nobody is doing that yet.
    i want to use 2 rtx 4060s 16gig for it.

  • @MartinStephenson1
    @MartinStephenson1 2 หลายเดือนก่อน

    Taking a dual 4090 system as a benchmark build /running cost. What would be the cost per hour to use a cloud provider with 48gb RYX A6000. When would it be cheaper to use a cloud service.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      @@MartinStephenson1 its multivariant, as individuals electric rates factor in heavily. Also how much utilization of the system is in play factors in heavy. This might be a good video topic, its pretty complex. Id also not go wth 4090s unless your doing imgen/videgen.

    • @gaiustacitus4242
      @gaiustacitus4242 หลายเดือนก่อน

      @@DigitalSpaceport Why not? nVidia's benchmarks show the 4090 to yield more than 4x the TOPS of the 3090. If the model will fit entirely in the V-RAM of a single 4090, then that will perform far better. Even the 4080 SUPER offers more than 2.5x the performance of the 3090, though you sacrifice 4GB of V-RAM.

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      This has not played out like their benchmarks when I did head to head tested of my dual 4090s vs dual 3090s. Maybe a good subject to revisit in a future video.

  • @HotloadsTTV
    @HotloadsTTV หลายเดือนก่อน

    I have a mining rig with 1x PCI-E Lanes. I thought the models required 4x PCI-E lanes? You mentioned in the video that it is possible to run GPU on x1 PCI-E for inference. Are there other caveats?

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      Not around the 1x point really. However I'll drop a note I didnt test that with the usb risers yet. It *should* not make a difference vs ribbons as its just loading the model into vram, much like a dag workload. However it would impact training horribly. It may impact some rag workloads depending on how much you batch into a document store. Let me know how it goes!

    • @НеОбычныйПользователь
      @НеОбычныйПользователь หลายเดือนก่อน

      @@DigitalSpaceport You can actually test this on your quad-3090 system. Just set PCIe version to 1x in BIOS and check the generation speed in large models and especially the speed of processing large context.

  • @tomoprime217
    @tomoprime217 2 หลายเดือนก่อน

    Is there a reason you left out the RX 7900 XTX a bad gpu pick for 24GB?

  • @mikegropp
    @mikegropp หลายเดือนก่อน

    Is 4 the sweet spot of 3090s? I have 6x 3090s, currently mining crypto. If I wanted to run 2x models, it looks like accuracy would take too big of a hit, might need 8 to hit 8-bit quant on 2 models.

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      Depends on the model params and context but for larger 70B ones yes 4 24gb gpus is often the q8 range. Those also are also are some of the best models aound right now

  • @christender3614
    @christender3614 2 หลายเดือนก่อน

    The most difficult decision is how much money to spend for a first buy. I’m kinda reluctant to get a 3090 config not knowing if I’ll be totally into local AI.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      3060 12GB is a good starter then. If you want img video gen heavy, 24GB is desirable. Local AI is best left running 24/7 in a setup however to really get the benefits with integrations abound in so many homeserve apps now.

    • @VastCNC
      @VastCNC 2 หลายเดือนก่อน +1

      Maybe rent a vm with your target config for a little before you start building?

  • @claybford
    @claybford 2 หลายเดือนก่อน +1

    Any tips/experience using NVLink with dual 3090s?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +2

      Its not needed unless you are training but i need to test on my a5000's that have nvlink to not just be a parrot on that. I did try it out but messed up something iirc and got frustrated. Will give it another shot soonish

    • @claybford
      @claybford 2 หลายเดือนก่อน

      @DigitalSpaceport cool thanks! I'm putting together my new 2x3090 desktop/workstation and I grabbed the bridge so I'll be trying it out soon as well

  • @Keeeeeeeeeeev
    @Keeeeeeeeeeev 2 หลายเดือนก่อน

    more than ddr4/ddr5 & MTs probably the interesting takeawy would be single vs dual vs quad channels vs 8 channels performances

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev 2 หลายเดือนก่อน +1

      ...maybe even more cache speeds and quantity....
      what are your thoughts?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      Forvsurevyou want tobwatch this video! Its the most in depth test on cpu impacts around and ive got a pretty crazy 7995wx in it 8 channels filled. th-cam.com/video/qfqHAAjdTzk/w-d-xo.html

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev 2 หลายเดือนก่อน

      @@DigitalSpaceport I missed that. tnx. whatching rn

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev 2 หลายเดือนก่อน

      same thoughts...faster cache and higher amounts would be my bet both on cpu and gpu.
      If I'm not getting something wrong the fastest gpus running llm ( both older and newers models) seems to be those with higher cache, higher memory bandwidht and bigger Memory Bus sizes.
      of course TFlops do count but to lower extent

  • @richard_loosemore
    @richard_loosemore 27 วันที่ผ่านมา

    Jeepers! Price for that 3090 is currently $1099.99 on Amazon (January 4th 2025) and yet you say “about 750 or 800, last time I checked”. 😢 The timestamp says that this video is only one month old, so is it really true that prices jumped by 50% in one month!?

    • @DigitalSpaceport
      @DigitalSpaceport  27 วันที่ผ่านมา +1

      Its 38% if 800 and 46% if 750 but yeah its a big increase. It started climbing mid october iirc and was hovering at 750-800 for a while.

    • @richard_loosemore
      @richard_loosemore 27 วันที่ผ่านมา

      @ I hadn’t been tracking prices at all, so that rate of increase was a shock. That said, the all up cost of your 4 gpu rig still comes in cheaper than my M4 Max.

  • @hotsauce246
    @hotsauce246 2 หลายเดือนก่อน

    Hello there. Regarding RAM speed, were you partially offloading the models in GGUF format? I am currently loading the EXL2 model completely into VRAM.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      No the model was fully loaded to vram this video tested multiple facets of cpu impact fairly decently. th-cam.com/video/qfqHAAjdTzk/w-d-xo.html

  • @TheYoutubes-f1s
    @TheYoutubes-f1s 2 หลายเดือนก่อน

    Are AM5 boards an option if you just want to do inference on three 3090s?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      Yes an AM5 will work. Full lane support is only needed for training/tuning models and image/video gen.

  • @FabianOlesen
    @FabianOlesen 2 หลายเดือนก่อน

    i want to suggest a slight lower tier, 2080 Tis that have been modified with 22G memory, running 2x system

  • @AIbutterFlay
    @AIbutterFlay 2 หลายเดือนก่อน

    How much would you charge to buy all the components of a 3090 quad?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      I think your looking for the cost analysis part of the video where I put that together, not sure if you saw that yet. Its here - th-cam.com/video/JN4EhaM7vyw/w-d-xo.html I would also suggest the H12ssl-i is about the same cost as the MZ32-AR0 also right now and a smaller board footprint and no sas connector bridge to get the top PCIE slot running.

  • @adjilbeh
    @adjilbeh 2 หลายเดือนก่อน

    hello what do you think about 7 rtx a4000 or rtx 4000ada the slim on with 20gb of vram and on slot. they have a lower tdp than a 3090rtx?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      The rtx's cost a bit more for the amount of VRAM but are great overall. My a5000s idle around 8w just a bit under the 3090s 10w.

    • @Nick-tv5pu
      @Nick-tv5pu หลายเดือนก่อน +1

      Faster more modern architecture than the 3090s too. I have four 3090s and one A4000 ADA SFF and was surprised by it's performance

  • @canoozie
    @canoozie 2 หลายเดือนก่อน

    My RTX A6000s idle at 23W so yeah, always on is expensive depending on your GPU config. I have 3x in each system, 2 systems in my lab.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      Mmmmmm 48GB vram each. So nice!!!

    • @canoozie
      @canoozie 2 หลายเดือนก่อน

      @@DigitalSpaceport Yes, they're nice. I was looking for a trio of A100s over a year ago and couldn't find them, so instead, I bought 6 A6000s because at least I could find them.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      If you think about it... I avg 10-12w per 3090 24gb the 23w per a6000 48gb seems to scale. Maybe idle is tied to vram amount also?

    • @canoozie
      @canoozie 2 หลายเดือนก่อน

      @@DigitalSpaceport That could be, but usually power scales with # of modules, not size. But then again, maybe you're right, because I looked at an 8x A100-SXM rig a while back, it idled each GPU between 48-50W and had 80GB per GPU.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      @canoozie my 3060 12GB idles 5-6 hum. Interesting. Also now im browsing ebay for A100s. SMX over pcie right? Im prolly not this crazy.

  • @KonstantinsQ
    @KonstantinsQ 2 หลายเดือนก่อน

    So i did not get it, more cores is better or worse for AI? For example Ryzen 9 5950x with 16 cores vs Ryzen 5 7600X with 6 cores?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      Its not super black and white. More cores are useful when you run more models. Ive seen a single model run 12 cores up at the same time briefly, others just 8. However 1-2 cores always stay ran up the entire output. Id go 8-16 cores minimum myself. If you have a lot of additional services, factor those in additionally. High all core turbo and single thread factor second behind GPUs.

  • @camelCased
    @camelCased 3 วันที่ผ่านมา

    3 t/s is the limit below which I get annoyed.
    Nowadays I can buy two new Asus 3060 12GB for 600 EUR total. A used Asus ROG Strix 3090 from a trustworthy seller costs 840 EUR. Hm, a dilemma... But not really, I don't have enough space for a new rig with two GPUs :(

  • @mrrift1
    @mrrift1 2 หลายเดือนก่อน

    What are your thoughts on getting 4 to 8 4060 ti with 16g vram?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      64GB VRAM is a very solid amount that will run vision and nemotron easy at q4 and not a bad card at all for inference.

    • @mrrift1
      @mrrift1 2 หลายเดือนก่อน

      @@DigitalSpaceport thanks i think i might have 8 4060 to 16g and will build a rig with them i would love to hear ant thoughts you have. i will have a budget of 3500 us for the rest and i have a Thermaltake
      The Tower 900 Full Tower E-ATX Gaming Case, Black as my case to start with

  • @GeneEkimen
    @GeneEkimen หลายเดือนก่อน

    Hello. Can you tell me, if I have rig with 8 p106-100 what models can I use on it? I think it very interesting graphic cards because now you can buy it for 10-15 dollars, maybe you can make a video with this cards. Thank you

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      These 3GB vram pascal 1060 headless iirc?

    • @GeneEkimen
      @GeneEkimen หลายเดือนก่อน

      @@DigitalSpaceportIt is 6gb vram pascal 1060 (Like gtx 1060 6gb)

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน +1

      Thats a good price for 6GB! Will investigate some

  • @minedustry
    @minedustry หลายเดือนก่อน

    If I added a ai translater to my game server, how many threads and how much video card would i need?

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน +1

      Is there software that does this specifically? Batching in parallel is fairly decently performant but if its just a message here or there id start with finding a model that does translationa the best (sry Im not sure which one that is) and then checking what params it supports. Then size your card to a q8 for that model. Dont forget to add like 20% for ctx

    • @minedustry
      @minedustry หลายเดือนก่อน

      It is just a few messages here and there. The slang, typos, and in-game jargon make all the translators that I've tried spit out garbage that just seems to create more confusion than help. I was hoping that you would know which software to use, but anyway, I should probably try something simple to learn on.

  • @Keeeeeeeeeeev
    @Keeeeeeeeeeev 2 หลายเดือนก่อน

    can you mix together amd and nvidia gpus on inference?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน +1

      Great question. Will test when I get an amd card.

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev หลายเดือนก่อน

      @@DigitalSpaceport within 2 weeks if I'll have enough time I'll probably let you now. Just ordered the cheapest 12 gb 3060 I 've found to add alongside a RX 6800😁

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev หลายเดือนก่อน

      I quicly tried lm studio and it was using both of amd rx6800 and rtx 3060 at the same time to run a model. I tried qwen and something else I don't remember. Unfortunately CPU decided to die 2 days ago. I'm stuck just with a laptop 😭 Untill new cpu arrivers I cannot try anything else

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน

      Oh nice so llama.cpp does work with both at the same time. Any observation on if it picked the amd over the nvidia to be the lead runner? Usually the model vram split is higher on that GPU from what I have seen

    • @Keeeeeeeeeeev
      @Keeeeeeeeeeev หลายเดือนก่อน +1

      @DigitalSpaceport didn't check. I gave it a quick test unfortunately.... I 'll try again with the new CPU when it arrives

  • @ChrisAlfa-k6g
    @ChrisAlfa-k6g หลายเดือนก่อน

    What do you think about this?
    1. Motherboard: MSI D3051GB2N-10G Micro ATX Server Motherboard, AMD Socket AM5, AMD B650, Dual 10GbE LAN, IPMI with KVM.
    2. CPU: AMD Ryzen 9 7950X, 16-Core/32-Thread, 4.5 GHz Base, 5.7 GHz Boost.
    3. Memory: NEMIX RAM 128GB (4x32GB) DDR5 4800MHz ECC Unbuffered UDIMM.
    4. GPU: XFX Speedster MERC310 AMD Radeon RX 7900 XTX Black, 24GB GDDR6, AMD RDNA 3.
    5. Power Supply: Thermaltake Toughpower GF3 1200W, ATX 3.0/PCIe 5.0 Ready, 80 Plus Gold, Fully Modular.
    6. Cooling Solution: 280mm AIO Liquid Cooler, AM5-compatible.
    7. Case: SilverStone CS382, Micro ATX, 8x hot-swap bays, supports up to 280mm radiator.
    8. NVMe Drive: Western Digital 500GB WD Red SN700 NVMe Internal SSD, Gen3 PCIe, M.2 2280, Up to 3,430 MB/s - WDS500G1R0C.
    9. SAS Drives: 8x Seagate Exos ST10000NM0096 10TB Enterprise SAS, 12Gb/s, 7.2K RPM, 256MB cache, 3.5".
    10. SAS Controller: 12G Internal PCI-E SAS/SATA HBA Controller Card, Broadcom SAS 3008, Compatible with SAS 9300-8I.
    11. SATA Drives: 4x Western Digital 500GB WD Red SA500 NAS SSD, SATA III 6Gb/s, 2.5"/7mm, Up to 560 MB/s - WDS500G1R0A.

  • @marianofernandez3600
    @marianofernandez3600 2 หลายเดือนก่อน

    what about cpu cache?

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      Doesnt seem to impact inference speed interestingly but would need engineering flamegraph to profile it really. Not a top factor for sure.

  • @demischi3018
    @demischi3018 24 วันที่ผ่านมา

    Well, well, well, it seems I have four 3090s lying around doing nothing. Guess I’d better get to work!

  •  2 หลายเดือนก่อน

    4gpus vs 1 m4 max. And its getting half the tokens/sec. Powering 4 gpus, the extra equipment needed to run them.. Apple seems like a no brainer.

    • @DigitalSpaceport
      @DigitalSpaceport  2 หลายเดือนก่อน

      Here is a timestamp comparing numbers on the same quants M4 Max vs quad 3090 ollama/llama.cpp Im not sure where your getting its half as slow? th-cam.com/video/0sotx35yYVM/w-d-xo.html&si=TypOesYh-ujt2_AQ

    • @christender3614
      @christender3614 2 หลายเดือนก่อน

      I guess this is about M4 Max being half as slow. I think there’s a point. Getting an M4Max MacBook(14 Core Version though which might be slower) doesn’t cost a lot more that 4 used 3090 where I live. So it’s way cheaper than a full system built around the 3090s. And it uses way less energy. And it’s way more versatile. So depending on what you want, half the speed maybe isn’t that big of a tradeoff. Though as I said, I’m not sure if the 14 Core version is comparable to the 16 Core version, which is more expensive than a system built around 4 3090s.
      Edit: It also depends on what you need AI for. If you’re looking to totally replace ChatGPT and ask questions all the time, speed matters more than if you’re happy with ChatGPT and only need local AI for special tasks and/of some more private stuff.

    • @joelv4495
      @joelv4495 หลายเดือนก่อน +2

      @@christender3614 With M4M MBP, you've gotta get the 16 core variant to get more than 36 GB memory.

  • @gaiustacitus4242
    @gaiustacitus4242 หลายเดือนก่อน +1

    Even if llama.cpp was highly optimized for parallel processing, it is impossible to achieve linear scaling across multiple GPUs.
    Also, when evaluating the Mac vs RTX comparison, remember the old adage of "Statistics don' t lie, but liars do statistics." The nVidia benchmarks only run very small models which fit entirely within the GPU's V-RAM. The performance of an RTX based rig falls on its face when the model is pushed out into system RAM. Running larger models which yield better results will run faster on the M4 MAX hardware because the memory is part of the system on a chip (SoC).
    FWIW, the only benchmark results I've found compare an nVidia RTX 4090 build against a baseline MacBook Pro M3 MAX. The M4 MAX neural processing unit (NPU) offers 38 TOPS performance, which is significantly better than the 18 TOPS of the M3 MAX NPU. Granted, this is far below nVidia's claims of 320 TOPS for the RTX 3090 or the 1,321 TOPS for the RTX 4090, but again, those numbers are only relevant for small LLMs (such as the 2.5B model used in the benchmarks) which fit entirely within the GPU V-RAM.

  • @CheapPartscaster
    @CheapPartscaster 24 วันที่ผ่านมา

    Nice video! Sub and thumbs up. Seems we share a passion.
    I have an older Ryzen 3950x CPU, an RTX 3070 8 GB and 64 MB RAM. I can run models like gemma2:27b-instruct-q2_K on that, at 6tokens/sec. That more than enough speed for me. I like to ponder on the responses I get. On larger models both CPU and GPU run simultaneously, like 75% CPU and 25% GPU. Some 70b q4 models run only CPU - slow at 1-2 token/sec, but "possible".
    Btw. Does anyone know the ollama /save ? It only "remembers" one generation of conversation back (by saving model and context window of the "current" session?). Is there a way to increase the amount of conversation saved, including older conversations?
    (That way you could create a "continuous" person between sessions. It sort of works with only "one generation". The model always remember "where we were". Oh, the meta-physics of this is mind-bending. :)

  • @drinerqc7434
    @drinerqc7434 หลายเดือนก่อน

    7900 xtx price is so low 🤔

    • @DigitalSpaceport
      @DigitalSpaceport  หลายเดือนก่อน +1

      The comments from AMD owners on the software side of their linux drivers have to be factored in heavily. People have had very negative experiences. It does seem like Lisa Su just had a very productive meeting on the software stack for ROCm and indicated there was big work coming down the pipeline. Hardware-wise, their specs are awesome.