Setting Up External Server GPUs for AI/ML/DL - RTX 3090

แชร์
ฝัง
  • เผยแพร่เมื่อ 28 มี.ค. 2024
  • 🚀 Join me on a deep dive into setting up external GPUs for high-performance AI computations! In this comprehensive guide, I'll walk you through connecting powerhouse GPUs like the NVIDIA RTX 3090 with NVLink to servers such as the Super Micro 4028GR-TRT and Dell Power Edge R720. We'll cover everything from hardware requirements to configuration steps.
    ⚙️ What you'll learn:
    - The advantages of using external GPUs for AI, machine learning, and deep learning tasks.
    - Step-by-step instructions on how to configure and connect your GPUs to your servers.
    - Tips for optimizing your setup for maximum performance and efficiency.
    🖥️ Equipment covered:
    Super Micro 4028GR-TRT
    Dell Power Edge R720
    NVIDIA GeForce RTX 3090
    🔧 Whether you're setting up a machine learning lab or looking to boost your deep learning projects, this video is your go-to resource for external GPU setups. Get ready to power up your AI/ML/DL workloads with unprecedented speed!
    #ExternalGPU #AIComputing #MachineLearning #DeepLearning #TechGuide #RTX3090 #SuperMicro #DellPowerEdge #AIHardware"
    🎥 Other Videos in the Series:
    Part 1 | Introduction | Your New 8 GPU AI Daily Driver Rig: Supermicro SuperServer 4028GR-TRT | • Your New 8 GPU AI Dail...
    Part 2 | Server Setup | 8 GPU Server Setup for AI/ML/DL: Supermicro SuperServer 4028GR-TRT | • 8 GPU Server Setup for...
    📚 Additional Resources:
    Link to Cost Breakdown Spreadsheet
    docs.google.com/spreadsheets/...
    AI/ML/DL GPU Buying Guide 2023: Get the Most AI Power for Your Budget
    • AI/ML/DL GPU Buying Gu...
    HOW TO GET IN CONTACT WITH ME
    🐦 X (Formerly Twitter): @TheDataDaddi
    📧 Email: skingutube22@gmail.com
    💬 Discord: / discord
    Feel free to connect with me on X (Formerly Twitter) or shoot me an email for any inquiries, questions, collaborations, or just to say hello! 👋
    HOW TO SUPPORT MY CHANNEL
    If you found this content useful, please consider buying me a coffee at the link below. This goes a long way in helping me through grad school and allows me to continue making the best content possible.
    Buy Me a Coffee
    www.buymeacoffee.com/TheDataD...
    Thanks for your support!
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 46

  • @CrashLoopBackOff-K8s
    @CrashLoopBackOff-K8s 2 หลายเดือนก่อน +1

    Just leaving a comment to help the channel grow and let you know that I appreciate the content you're putting out. You may have covered this in another video, but can you please explain the power setup for your lab? I'm assuming a dedicated 20A circuit for the lab, but any details around the power consumption, UPS(s), and any gotchas you've encountered would be very beneficial. Thanks and looking forward to seeing the channel move forward.

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      Hi there. Thank you so much for commenting. It really does help out the channel.
      So I have a power, heat, and sound video for the Dell R720s I have (linked below), but I have not done a one for the full home lab. I was planning on doing one soon one it is fully modified. Still have a few more things to do. I currently have one 20A circuit run for the lab. However, it seems to fine for my needs thus far. But I am going to run a second 20A circuit over there soon so that I can expand in the future if needed. Stay tuned though for a full video on this soon.
      th-cam.com/video/tmMx8AouTGA/w-d-xo.html

  • @birdstrikes
    @birdstrikes 2 หลายเดือนก่อน +1

    Good stuff

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      Thanks so much for the kind words!

  • @arkadia8330
    @arkadia8330 2 หลายเดือนก่อน +1

    Hey. Thanks for all the materials that you create. Gpu excel. etc. I have a question. do you recomend me invest a little bit more to build a SMX platform? for a home lab? something with cheaps p100 16gb SMX2. and later change for V100 32gb SMX2 too?.

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน +1

      Hi there. Thanks so much for the comment! So glad the content is useful to you.
      I will honest with you I am not the most knowledgeable about SMX in general, but I certainly think it would be cool to experiment with. I cannot speak on the performance benefits of SMX over more traditional architectures. One thing I have heard about SMX is it can be difficult to set up correctly. I would say this though. It is much more common to find non SMX gpus so they are likely more plentiful, easier trade upgrade etc., and there is more of a community around how to install and set them up. Personally, I would probably avoid the SMX architecture because of the added cost and murkiness around setup, but if you want the extra performance gains or just want to experiment with I would say go for it. I would be extremely interest to hear how that goes if you do go that route.
      Maybe one day when the channel gets bigger I can buy some SMX GPUs and make some video on how all that works and how the performance compares with the traditional architecture.
      I am sorry I could not provide more guidance here, but I hope this helps you! Please do let me know what you decide to do.

  • @asganaway
    @asganaway หลายเดือนก่อน +1

    Wait, is the kde GUI not occupaing any ram on any GPU @41:07 ?
    When I was working in similarry sized workstation was annoying to see Gnome be a little to hungry..
    P.S.
    Just new on your channel and subscribed, good job, waiting for the benchmarking, in particular how the P40 will perform with half precision, my experience with that generation is that maybe it's not convinient compared to double but may be wrong.

    • @TheDataDaddi
      @TheDataDaddi  หลายเดือนก่อน +1

      Hi there! Thank so much for the comment and the subscription! Really appreciate that.
      To my knowledge, the GUI rendering is handled by the Aspeed AST2400 BMC (Baseboard Management Controller) which is the mother board's built-in graphics card. Yeah I have read that Gnome can be like that. It was one of the reasons I went with KDE Plasma instead.
      Yep, I am working on finish up a preliminary benchmark suite. As soon as I get it finished, I will do a video on the p40 vs p100 and open source the project via Github repo.
      Thanks again!

    • @asganaway
      @asganaway หลายเดือนก่อน

      @@TheDataDaddi yeah, the embedded gc, I suspected that, mine doesn't have it but I have to say with those kind of set up it is very convenient to have it, at least you will not experience the frustration of a gpu not fitting the whole model because someone is using firefox logged in :D
      Keep us posted with the benchmark project if I can I'll run it myself on the hardware I have.

  • @sphinctoralcontrol
    @sphinctoralcontrol 10 วันที่ผ่านมา

    Tossing up between 3090s, A4000, and P40/P100 cards for my use case which would not exactly be ML/DL but rather local LLM usage hosted using something the likes of OLlama and various (I assume at least q4) models of higher parameters. I'm also dabbling with Stable Diffusion as well - at the moment I am shocked I'm able to run q4 quantized LLMs via LM Studio as well as Stable Diffusion models, on my little old aging M1 2020 Macbook Air with 16GB ram. I'm getting into the homelab idea, especially the idea of using a Proxmox server to spin up different VMs (including the Mac ecosystem) with way higher resources than what I'm working with currently. I'm also looking to integrate a NAS and other homelab services for media - but the GPU component is where I'm a little hung up - just what tier of card, exactly, is needed for this sort of use case? Am I nuts to think I could run some of the lesser quantized (as in, higher q number) LLMs on the low profile cards, as well as SD? It's been 10+ years since I've build a PC and am totally out of my element in terms of knowing just how good I've got it using the M series of chips - I've even been hearing of people running this sort of setup on a 192GB RAM M2 Ultra Mac Mini Studio, but would really love to get out of the Apple hardware if possible. I realize this was several questions by now... but, to distill this down, GPU thoughts? lol

    • @TheDataDaddi
      @TheDataDaddi  7 วันที่ผ่านมา

      Hi there. Thanks so much for you question!
      Yeah so this is a really good question. It really depends on the size of model you are trying to run. For example, to host Llama2 70B for FP16 you need approximately 140GB of VRAM. However, you could run quantized versions with much less. Or you could always work with the smaller model sizes. In terms of GPUs, I would recommend GPUs that have at least 24GB VRAM. I have been looking at this a lot for my next build, and I think I actually like the RTX titan best. The RTX 3090 would also be a good choice its FP16 performance just isn't as good. I think the P40/P100 are also great GPUs for the price, but for LLMs specifically they may not be the greatest options because the p100 has only 16 GB of VRAM and the p40 has very poor FP 16 performance. Another off the wall option is to look at the V100 SMX2 32GB. Since these are are SMX2, they are cheaper, but there are a lot fewer servers that they will fit in. The only one I know of off the top of my head is the Dell C4140/C4130. From my research, they the SMX2 GPUs are also fairly tricky to install. Anyway, these are the routes I would go to make a rig to host these models locally. I will eventually build a cluster to host and train these models locally so stay tuned for videos to come on that subject if you are interested

  • @mannfamilyMI
    @mannfamilyMI หลายเดือนก่อน

    Would love to see inference speed of something like llama on the P100 and P40 cards. I have dual 3090's so I'm familiar with that, but looking to 8x to gain more vram, but don't want the complexity of consumer cards. Have you considered the AMD MI100 card by chance?

    • @TheDataDaddi
      @TheDataDaddi  หลายเดือนก่อน

      I am working on some benchmarks now. I will try to get some quick ones for some of the open source LLMs because everyone seems to be most interested in those at the moment. Stay tunned, and I will try my best to get them out as quick as I can.
      I have not personally gone the AMD route yet, but its something I plan on experimenting with down the road. Its funny you mention the MI 100 card though. I was actually talking to a viewer the other day and he was telling me about his experiences with AMD and using the MI 100. To summarize his experience: "AMD is not worth it if you value your time, but once it's working it is fairly decent and a good alternative to Nvidia."
      If you are interested in this route, please reach out to me. You can find my contact info in my TH-cam bio, and I can try to put you in touch with the other viewer.

  • @LetsChess1
    @LetsChess1 หลายเดือนก่อน +1

    So ive got the same 4028GR-TRT. With the rtx 3090s you got in there you cant fit them in the server however the 3090 founder edition can fit in the case. take a little bit of forcing but they will fit as the power comes out the side with that weird adaptor. That way you dont have to run a whole bunch of other crazy risers and in run them all over the place.

    • @TheDataDaddi
      @TheDataDaddi  27 วันที่ผ่านมา

      Hey there. Thanks so much for the comment!
      Ah cool! This is great to know. I figured that the founders edition would at least have a shot a fitting in, but I wasn't sure so I went the external route. As for the adapter are you talking about something like this?
      www.ebay.com/itm/134077093756

  • @MattJonesYT
    @MattJonesYT หลายเดือนก่อน +1

    When you get it running I will be very interested to see where the bottle neck is and whether it's the cpu or the gpus because the older xeons seem to be a bottle neck

    • @TheDataDaddi
      @TheDataDaddi  หลายเดือนก่อน

      I will eventually make a whole series devoted to LLMs and best setups just for that. In that series, I will definitely report back on where the bounds are. Unfortunately, my time is just incredibly limited these days. I will try to get this info out as soon as I can.

  • @HemanthSatya-eo4rq
    @HemanthSatya-eo4rq 2 หลายเดือนก่อน +1

    I'm new to this field. could you make a video on pc build for AI/ML/DL. I'm thinking to build one to remotely access it using anydesk to do computational work when needed. what do you recommend server or pc build? could you please suggest

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      So I actually have another video series on a pc build for AI/ML/DL. I will link below.
      th-cam.com/video/8JGe3u7_eqU/w-d-xo.html
      I think the server route is more cost effective and faster to setup, however, when creating a custom build you have more control. Personally I prefer the server route for the cost and time savings.

    • @HemanthSatya-eo4rq
      @HemanthSatya-eo4rq 2 หลายเดือนก่อน +1

      @@TheDataDaddi Thanks sir!

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      @@HemanthSatya-eo4rq Sure! No problem. Happy to help!

  • @smoofwah3552
    @smoofwah3552 หลายเดือนก่อน +1

    Hm im still watching but im looking to build some deep learning rig to host llama 3 400 dense , how many 3090s can i out together and how do i learn to do that?

    • @TheDataDaddi
      @TheDataDaddi  หลายเดือนก่อน

      Hi there! Thanks so much for the comment.
      I am not really sure it is possible for you host llama3 400 dense on 1 machine (without quantization). By my calculation, you would need close to 1TB of VRAM to hold the model. This would need to be split across many GPUs. Even if you used many H100 or A100 GPUs, you would likely not be able to host them in the same machine. It would take something like 12 or more of either. I do not know of any single servers that could support this.
      In this particular case, the super micro 4028GR-TRT shown in this video could theoretically handle up to 8 rtx 3090s rigged externally. That is only going to give you about 192 GB of VRAM. You might be able to get away with hosting the full 70B model without quantization with that amount of VRAM. However, for the 400 dense you are still a long ways off.
      To be able to host a model of that size, a much more practical and realistic way to do it would be to setup a distributed computing cluster with many GPUs on several different servers. In a distributed computing setup, each server would handle a portion of the model, allowing you to distribute the computational load and memory requirements across several nodes. This approach not only makes it feasible to host large models like llama3 400 dense, but it also enhances overall performance through parallel processing.
      To implement this, you might try utilizing frameworks that support distributed deep learning, such as TensorFlow with tf.distribute.Strategy or PyTorch with torch.distributed. These frameworks are designed to help manage the distribution of data and computations, ensuring that the workload is evenly spread and that the nodes synchronize effectively.

  • @JimCareyMulligan
    @JimCareyMulligan 2 หลายเดือนก่อน +1

    There are risers with an oculink interface. They are more expensive, but they have more compact and longer cables (up to 1m I belive). You can connect up to 4 cards to a single x16 PCIe slot, if 4 PCIe lanes per GPU are enough for your tasks.

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      Hi there! Thanks so much for the comment.
      Interesting. I have never heard of oculink, but I will certainly check it out! Thanks so much for the heads up.

    • @KiraSlith
      @KiraSlith 2 หลายเดือนก่อน +1

      I'mma have to fourth this suggestion. As-is that's a mildly terrifying setup. Typical PCIe extensions only come in certain lengths due to data integrity concerns with PCIe 4.0 and up. Quality Oculink cables have embedded redrivers to ensure link integrity and are tear-out resistant, which is MUCH safer...

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน +1

      Interesting. That explains why the long ones were so difficult to find. Haven't seemed to have any issues with data integrity, but I will certainly check into this issue. Thanks for the comment!@@KiraSlith

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน +1

      I have been looking and so far I can really only find Oculink cables in X4 and X8 lane configurations. I have not seen anything for the full X16 lanes. They seem like they are mostly for connecting SSDs. Do you have an example of one that could be used for x16 lanes to replace the PCIE extenders I used in my setup? I am struggling to find anything that looks like it would work. @@KiraSlith

    • @KiraSlith
      @KiraSlith 2 หลายเดือนก่อน +1

      @@TheDataDaddi You'll have to run 2 Oculink cables if you want the full 16x on both ends. Chenyang makes the x16 to dual Oculink i8 card for the server's end, and "Micro SATA Cables" (it's their brand name) sells the receiving adapter for the GPU, "Oculink 8i Dual Port to PCIe x16 Slot". Slot A on the card goes to CN1 on the receiver, Slot B on the card goes to CN2 on the receiver. Don't mix them up or you'll probably get caught in a boot loop.

  • @ericgolub8589
    @ericgolub8589 2 หลายเดือนก่อน

    Hi, was wondering if you could give me the basic specs of what you think is the optimal 2,000$ ML rig, I'm imagining it might have 4 P40s or similar but I can find any cheap server for 4 GPUs. The 1u 720s appear to support max of 2 GPUs, then the expensive Supermicro supports up to 8. Is there an intermediate solution? Thanks for your help

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน +1

      Hey Eric! I spent some time this morning looking for something that would fit your particular situation. Please take a look at following and let me know what you think:
      ASUS ESC4000 Server - $499.00
      www.ebay.com/itm/134879048174?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=rI5jpFxtSLW&sssrc=2047675&ssuid=xv3fo9_stiq&widget_ver=artemis&media=COPY
      P40 GPUs x4 - $149.99 x 4 = $599.96
      www.ebay.com/itm/196310785399?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=DDOYB0ZoRzO&sssrc=2047675&ssuid=xv3fo9_stiq&widget_ver=artemis&media=COPY
      3.5" HDD 10TB - 69.99 x 4 = $279.96
      www.ebay.com/itm/156130335844?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=kE7Z0-UgRW6&sssrc=2047675&ssuid=xv3fo9_stiq&widget_ver=artemis&media=COPY
      Power Cords - $9.56 x 2 = $19.12
      64 GB DDR4 RDIMM Modules (Optional) - $99.99 x 4 = $399.99 (DOUBLE CHECK THIS RAM WILL WORK WITH THIS SERVER)
      www.ebay.com/itm/224440216180?mkcid=16&mkevt=1&mkrid=711-127632-2357-0&ssspo=hCBgAHuqSCi&sssrc=2047675&ssuid=xv3fo9_stiq&widget_ver=artemis&media=COPY
      Server Rails (Optional) (DOUBLE CHECK COMPATIBILITY) - $150.23
      www.newegg.com/p/1B4-005K-01646
      GRAND TOTAL: $1,948.26
      Other considerations:
      This server does not come with a raid controller so if you want one that will be a bit more. I would personally recommend software RAID as there are some pretty good options out there. Also, you do not need RAM, but I would recommend getting at least 256GB. You can also just add to the RAM you already have which would be cheaper, but I like to go with more memory dense modules to allow room for expansion. Finally, this is just a suggestion so please do you own research with this before you actually buy everything to make sure it will all work together. I have done my best to ensure compatibility, but double check me for sure before you buy. You will also need to take into account shipping costs and taxes for your area so I will likely be a little above the $2K mark. If that is a hard limit for you, you can always remove things from the suggested setup and add them later (or not at all). With that said, please feel free to change the setup for whatever makes the most sense for you. This is just what I would do.
      Hope this helps and please let me how how it goes for you!

  • @ALEFA-ID
    @ALEFA-ID 2 หลายเดือนก่อน

    do we need nvidia SLI to fine tune LLM with multiple GPU?

    • @TheDataDaddi
      @TheDataDaddi  หลายเดือนก่อน

      Hi there. Thanks so much for the comment.
      First and foremost, NVIDIA SLI is different than NVLink. SLI is primarily designed for linking two or more GPUs together to produce a single output for graphically intensive application (gaming in particular). It is not really designed for AI/ML/DL, and is not really used for this purpose to my knowledge. Also, for the 3090 it does not use SLI it uses NVLink.
      For NVLink, you do not necessarily need it. It does not make the total memory pool for each GPU any different, but it is certainly a nice to have. It will significantly speed up most operations as it allows communication directly between the GPUs at hundreds of GB/s. So, it will not prevent you from working with LLMs, but it will make you much faster when dealing with them if that makes sense.

  • @gileneusz
    @gileneusz 2 หลายเดือนก่อน +1

    32:50 you could essentially do it using a clothes drying rack

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      Basically. Lol. Its not a very elegant solution, but it works pretty well actually. I was sure if I liked it at first, but it has been working out quite well and it keeps the GPUs surprisingly cool

  • @gileneusz
    @gileneusz 2 หลายเดือนก่อน

    40:55 my thoughts are that it would be much easier to cut the cover, or do some kind of 3d printed cover with holes for 3090s or 4090 psu connectors. The only obstacle would be to provide sufficient amount of power to 4090, but you can always limit the power of the card...or just add psu on the top of the server. still... 8x 4090/3090 won't fit on width

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      Yeah you are probably right here. Cutting the cover with a 3 printed cover is certainly and interesting idea. I actually thought about 3D printing a mount for the GPUs, but I just haven't had time to design one. I am not sure what the power output is for each of the 12V eps it may be enough actually to power the 3090s. Idk about the 4090s though. In any case, you could likely only fit 4 GPUs in the server if you went that route so there should be plenty of power in the server (theoretically). There actually may be some other form factors though that would allow you to fit all 8. I am just not sure.

    • @gileneusz
      @gileneusz 2 หลายเดือนก่อน +1

      @@TheDataDaddi I'm considering buying 6-8x 4090 and just play with the cover. Those 4090s would be 2-pci slots wide with individual water cooling factor. But still the server would be laud - ah, I want it to be silent.... ah, no good solutions. All these factors add up to a big mess ;)

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      Yeah the silent part is really going to be your biggest issue. The SM4028GR-TRT is truly the loudest server I have ever worked with. On boot and when under heavy load it can get above 90 dB. However, at idle its not really too bad. I would be really curious to know if you can fit 8x 4090s. Please keep me update with your journey if you remember!@@gileneusz

    • @gileneusz
      @gileneusz 2 หลายเดือนก่อน +1

      @@TheDataDaddi Sure, I'm leaning towards buying 2xA6000 to get 92GB of RAM on desktop, without any server... I have 2 cats, and they hate noise.... 😅 but I'm thinking about alternative to buy SXM4 server and populate it with 4xA100 40GB. I can buy them used on ebay for $5k each, but resell those things later would be almost impossible...

    • @TheDataDaddi
      @TheDataDaddi  2 หลายเดือนก่อน

      2xA6000 would be a good route if you are looking to keep things quiet. Thankfully unless mine are booting up my cat doesn't mind the noise. lol. As far as the SMX4 A100s are concerned, this would definitely be an interesting route. I think you could probably resell them actually. You just might have to be more patient as most people do not have SMX4 servers. @@gileneusz

  • @ydgameplaysyt
    @ydgameplaysyt หลายเดือนก่อน +1

    tokens/s?

    • @TheDataDaddi
      @TheDataDaddi  27 วันที่ผ่านมา

      Hi there. Thanks so much for your comment!
      Could you provide a bit more context for your question? I would be happy to help. I am just not sure exactly what you are asking here.