Dual 3090Ti Build for 70B AI Models

Ominous Industries

มุมมอง 31 524

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 3 ม.ค. 2025

ความคิดเห็น • 111

@i6od 7 หลายเดือนก่อน ⁺¹⁰
i seen a reddit post of guy running 4 p100 16gb under 1300$ getting 30 tokens a second with vLLM , on 70b llama 3 lol, im so happy to see other builds like dual 3090's too, so far i have managed to pick up one titan rtx im hoping to shoot for a 3090 or another titan rtx,
@OminousIndustries 7 หลายเดือนก่อน ⁺¹
Its been very cool to see the use cases of older cards for localllm setups. I want to grab a tesla p40 at some point and put in in a restomod llm pc if nothing more than for the cool factor of how it looks.
@garrettmandujano2996 2 หลายเดือนก่อน ⁺³
That original pc case is actually really cool
@OminousIndustries 2 หลายเดือนก่อน ⁺¹
It is an old Lian Li case, I agree and actually bought another older similar case because I liked it so much.
@UnkyjoesPlayhouse 6 หลายเดือนก่อน ⁺²
Excellent, I am getting ready to build something like this, using an Epyc CPU and Supermicro MB.
@OminousIndustries 6 หลายเดือนก่อน ⁺¹
That will be a great system. It's been a ton of fun to have and the amount of new repos I am finding that allow me to take full advantage of the system has really been enjoyable.
@enilenis 4 หลายเดือนก่อน ⁺³
Titan RTX's can bridge memory for 48GB VRAM, at half precision, that's as good as 96GB. Can do some serious AI work. Only other similar option is A6000, but it costs 2x as much as a par of Titans. They're form the RTX 2000 generation, so a bit slower than a 3080, but not by much.
@OminousIndustries 4 หลายเดือนก่อน ⁺¹
I did not know what the titans can pool the memory, that would be very useful for some of the image/video generation models as even with an nvlink it won't appear as one card (at least from what I am aware of RE my 3090's)
@OminousIndustries 4 หลายเดือนก่อน ⁺¹
@@enilenis Very good point. Yes, in the opensora test I did and based on others feedback when trying it, the bottleneck is always the vram. I would be fine with a speed trade for more vram.
@UpNorth937 8 หลายเดือนก่อน ⁺²
Great video!
@OminousIndustries 8 หลายเดือนก่อน
Thanks very much!
@jeffr_ac 2 หลายเดือนก่อน
Great! Thanks for sharing, that’s exactly what I’m trying to do.
@OminousIndustries 2 หลายเดือนก่อน
Sure thing! It is a fun setup indeed!
@atabekkasimov9702 9 หลายเดือนก่อน ⁺¹
Did You plan to use NVLink with new Ryzen setup?
@OminousIndustries 9 หลายเดือนก่อน ⁺³
It is something I would like to add once I swap over to a threadripper. I have seen conflicting opinions on how much it helps but I would like it for "completeness" if nothing more.
@jacobstanley7089 2 หลายเดือนก่อน
@@OminousIndustriesit’s true, I work in the vfx industry. We used 3090s a lot in pooled render rigs. It’s one feature I miss in the 4090.
@zacharycangemi9525 2 หลายเดือนก่อน
Good set up. The only issue I have with it is how close the GPUs are stacked together. I understand that these are some thickkk boy GPUs, but you should have at least ONE pcie slot of separation MINIMUM between these 2 gpus. The least of your worry will be the the top GPU overheating. The CPU will get too hot for too long
@OminousIndustries 2 หลายเดือนก่อน
Yes, it was pretty bad cooling wise. The gpus are now watercooled and do not get very hot at all anymore !
@cybermazinh0 9 หลายเดือนก่อน ⁺¹
The video is very cool, the case of the 3090 could be very beautiful
@OminousIndustries 9 หลายเดือนก่อน ⁺¹
Thanks very much! I am going to be swapping everything over into a Thermaltake View 71 case very soon.
@jamesvictor2182 8 หลายเดือนก่อน
Unlike the inside of that case!
@DuduHarar1 หลายเดือนก่อน
Thanks for this video.
@OminousIndustries หลายเดือนก่อน
Sure thing, thanks!
@MikeHowles 7 หลายเดือนก่อน ⁺²
Bro, use nvtop. You're welcome.
@OminousIndustries 7 หลายเดือนก่อน
I'm going to install that tonight for my intel gpu build, I previously hadn't found a monitor for that gpu on linux.
@victoraigner 4 หลายเดือนก่อน
Thank you for suggesting it. Super useful!
@mother3crazy 21 วันที่ผ่านมา ⁺¹
Could this be used as a badass gaming computer? Would two 3090Ti’s run games faster than a single 4090?
@OminousIndustries 21 วันที่ผ่านมา
I am not much of a gamer, but from what I have read, you're better off with a single 4090 as it is faster and can take advantage of newly released game-centric technologies. I do not believe vram is as big of a consideration for gaming as it is for something like llms which is what this machine was made for.
@jamesvictor2182 8 หลายเดือนก่อน ⁺²
I am awaiting my second 3090 ti, probably going to end up water cooling. How has it been for you with heat management?
@OminousIndustries 8 หลายเดือนก่อน ⁺²
I have not seen crazy temps while running localllama. I did render something in keyshot pro that made the cards far too hot but for any llm stuff it hasn't been too bad at all.
@sepehrsalehi-kx4xb 27 วันที่ผ่านมา
I wonder how hot gpus will get on the full loud ?
@OminousIndustries 26 วันที่ผ่านมา
They would get too hot, I ended up liquid cooling them as I was not comfortable with the potential heat they would generate at full load.
@melvingeorge2729 4 หลายเดือนก่อน ⁺²
would connecting the two 3090Ti's with NVLink make it more capable at handling AI models?
@OminousIndustries 4 หลายเดือนก่อน ⁺¹
I have heard conflicting info on this, so I won't speak with certainty. With that said, It is my understanding that if you're just running llms and such it won't make a noticeable difference. I have heard that in training instances it may add some benefit but I am not able to verify that myself.
@zacharycangemi9525 2 หลายเดือนก่อน
@@OminousIndustries the newer pcie ports can substiitute the nvlink, there will be a time delay in training ect., but not that bad, can do mem pooling as well
@rafal_mazur 5 หลายเดือนก่อน ⁺²
Hello, I’m just building a set with two 4080 foundation cards, on the Asus Z790 WiFi board, but I’m wondering if there will be a problem with cooling, temperature as there are such large cards for 3 slots one above the other, do you have any problems with it?
@OminousIndustries 5 หลายเดือนก่อน ⁺²
Honestly, yes. I would look for alternate cooling solutions for the long term. If I have both cards training a model the top card has sometimes gone north of 85c which is not good. I believe as an alternate solution you can also power limit the cards to help prevent them from getting too hot as well, but at this point I am seriously looking into some water cooling solutions.
@rafal_mazur 5 หลายเดือนก่อน
@@OminousIndustries you don’t tray mount one card vertical ?
@OminousIndustries 5 หลายเดือนก่อน
@@rafal_mazur Not a possibility in this current setup. Once I swap everything over to the tf view 71 I have embarrassingly) had sitting in the box for a few months I will explore alternate mounting options, though I will likely just water cool the system at that point since I will have to "rebuild" it all anyways.
@stknguyn2806 3 หลายเดือนก่อน
Nice! Need more videos like this 😶😶
@OminousIndustries 3 หลายเดือนก่อน
Thanks very much !!
@cyrigofficial 7 หลายเดือนก่อน ⁺¹
Extremely toasty if you do not lock power limit under 300w
@AricRastley 4 หลายเดือนก่อน ⁺¹
Is there any real notable difference between 3090’s and 4090’s when running ai? Great video btw
@OminousIndustries 4 หลายเดือนก่อน ⁺²
Thank you! I do not have personal experience with a 4090 for any AI related tasks, so take what I say with a grain of salt. My understanding is that while there will be a speed differential for performing certain tasks in favor of the 4090, the vram is the important part in terms of ability to "fit" the thing being run. To simplify, both cards would be able to run the same "items", though the 4090 would be faster in terms of the generation speed. Another example is with offline video gen stuff like Open Sora. The bare minimum generation needs a 24gb card so both the 3090 and 4090 would be able to generate a result, though speeds may differ. In terms of training and multi card setups, there appears to be a lot more to consider beyond the speed differential of the two cards, such as memory bandwith, a cpu with support for the neccesary number of pci lanes, etc. The r/localllama subreddit is a very good resource as a lot of folks there have experience with these sorts of builds and set ups. If the budget is there, I would go with the 4090(s), though with that said a great many of us are happily having fun and running neat things with 3xxx series cards as well.
@lilunchengsmiles 3 หลายเดือนก่อน ⁺²
The difference lies in whether you can run AI models or not, depending on whether you have 24GB or 48GB of memory on some large LLMs.
@yes51472 หลายเดือนก่อน
Is heat an issue for you? Considering getting another 3090 for my build, but worried about overheating. Do they really generate that much heat during inference?
@OminousIndustries หลายเดือนก่อน
Yes, it was and yes they do, though training is what made them too hot for me to be comfortable with. I ended up watercooling both of them and they run much much better. Though, the case I had them in was from the late 2000's and was not designed to provide adequate airflow to a setup like this, so I believe had I air cooled them with a proper modern case and fans, it may not have necessitated going liquid cooled.
@M4XD4B0ZZ 9 หลายเดือนก่อน ⁺¹
Ok so i am very interested in local llms and found that my system is way too weak for my likings. But i really have to ask.. what are you doing with this technology? I have no "real" use case for it and wouldn't consider buying two new gpus for it. What are actual beneficial use cases for it? Maybe coding?
@OminousIndustries 9 หลายเดือนก่อน ⁺⁴
I have a business that utilizes LLMs for some of my products so it is a 50/50 split between business-related research and hobbyist tinkering. The requirements to run LLMS locally are heavily dependent on the type and size of model you want to run. You don't need a large vram setup like this to fool around with them, I just went for this so that I could run larger models like 70B models. Some of the smaller models would run fine on an older card like a 3060 which can be had without breaking the bank. Some of the model "curators" post the requirements for vram for the models on huggingface, bartowski being one who lists the requirements.
@M4XD4B0ZZ 9 หลายเดือนก่อน
@@OminousIndustries thank you for the insights really appreciate it
@OminousIndustries 9 หลายเดือนก่อน
@@M4XD4B0ZZ Of course!
@mnursalmanupiedu หลายเดือนก่อน
Does the dual GPU use NVLink bridge, Bro?
@OminousIndustries หลายเดือนก่อน ⁺¹
It doesn't. Now that the cards are watercooled I do not believe the bridge would even fit on them either so I will likely never have it installed.
@mnursalmanupiedu หลายเดือนก่อน
@@OminousIndustries Ok, thanks! This is very inspiring!
@PeakFlowState 2 หลายเดือนก่อน
Are you able to run llama 3.2 90B, or does this exceed the available vram?
@OminousIndustries 2 หลายเดือนก่อน
I have not tried that. I believe hypothetically I could run it in some form of quant, but I don't know if it would be one that would be worth using, as I have read that below Q4 seems to become a bit sketchy output wise.
@sebastiank.2694 3 หลายเดือนก่อน
What motherboard are you using? I need to find one that will fit two big cards like yours.
@OminousIndustries 3 หลายเดือนก่อน
It is an MSI Pro Z690. With that said, I think any ATX mobo with multiple PCIe slots would accommodate the two cards. Be mindful of temps with a setup like this however, as the cards were getting uncomfortably warm during training.
@jesusleguiza77 3 หลายเดือนก่อน
@@OminousIndustries Hi, what cpu are you using?
@OminousIndustries 3 หลายเดือนก่อน
@@jesusleguiza77 It is a 12th gen i7
@Trule420 6 หลายเดือนก่อน ⁺¹
Hi, I have 2x3090 using asus crosshair x670E hero, could you show me how to enable nvlink please.
@OminousIndustries 6 หลายเดือนก่อน
I unfortunately have not yet nvlinked these cards so I won't be much help with this. I would suggest heading over to one of the ML adjacent subreddits like r/localllama where I'm sure a few people at least have gone through those steps and could help you out!
@iHeartApples 5 หลายเดือนก่อน
NVIDIA RTX NVLink Bridge P/N: NVRTXLK2 or NVRTXLK3
@ALEFA-ID 4 หลายเดือนก่อน
You use this setup just for running the 70b llm model? Or for fine tune?
@OminousIndustries 4 หลายเดือนก่อน ⁺¹
Just for 70b models initially, but I now use it a lot for SD and other random experiments which I suppose don't necessarily need two cards but oh well lol
@ALEFA-ID 4 หลายเดือนก่อน
@@OminousIndustries thanks for replying man. nice info. I was asking because I want to have the same setup lol. but with dual rtx 3090 non TI.
@OminousIndustries 4 หลายเดือนก่อน ⁺¹
@@ALEFA-ID Of course! It is an awesome setup to have regardless if the cards are ti or not, 48gb builds are sick! Check out r/localllama on reddit as well if you're interested in the community based around home setups like this.
@ALEFA-ID 4 หลายเดือนก่อน
@@OminousIndustries Yeah that's so great, can't wait to have that setup. Thanks again dude!
@OminousIndustries 4 หลายเดือนก่อน
@@ALEFA-ID Of course! I am excited for you hahah if you have any other questions or anything feel free to reach out anytime! I just got two waterblocks shipped so I will be posting a watercool build video soon(ish) :)
@brookerobertson2951 2 หลายเดือนก่อน ⁺⁴
I have the same setup. I use to make AI furry p0rn.
@LucaPierino หลายเดือนก่อน
4x p40 wouldnt be cheaper and better in perfs ?
@OminousIndustries หลายเดือนก่อน ⁺¹
It would only be better in terms of how large a model I could run. They are slower to run the models and having to have four separate cards would have added additional considerations like a new MOBO and having to deal with linking them all to use the vram with whichever service I was going to use them with.
@mixmax6027 8 หลายเดือนก่อน ⁺¹
How'd you increase your swap file? I have the same issues with 72B models running dual 3090s
@OminousIndustries 8 หลายเดือนก่อน ⁺¹
These instructions should work, though I have only used them on 2022.04 wiki.crowncloud.net/?How_to_Add_Swap_Space_on_Ubuntu_22_04#Add+Swap
@Chilly.Mint. 5 หลายเดือนก่อน
Hello!, I was trying to find 3090 cards but only found differents brands (Galax and EVGA). Would that affect performance? or should I keep searching for the same model of cards.
@OminousIndustries 5 หลายเดือนก่อน ⁺¹
I can't say for sure as I have not personally tried with different cards, but I will say that I have seen people combining multiple cards of the same family, like a 3060+3090 and have made it work so I wouldn't think there would be any issue. The different cards can have slight differences like clock speeds, etc, though I don't believe that would make a noticeable difference.
@Chilly.Mint. 5 หลายเดือนก่อน
@@OminousIndustries Thank you for the answer!
@OminousIndustries 5 หลายเดือนก่อน
@@Chilly.Mint. Sure thing! Be sure to check out the localllama subreddit as well, it contains a lot of personal experience from people running setups like this one.
@mbe102 9 หลายเดือนก่อน
What is the aim for using opendalie? Is it just... for fun, or is there some monetary gain to be had through this?
@OminousIndustries 9 หลายเดือนก่อน
Personally I just use it for fun. Some people use these uncensored image models to generate NSFW images that they then release on patreon, etc to make some money, but that is not in my wheelhouse.
@emiribrahimbegovic813 7 หลายเดือนก่อน
Where did you buy your cafd
@OminousIndustries 7 หลายเดือนก่อน
I got it at Micro Center, they were selling them refurbished. Not sure if they still have any in stock. They also had 3090s.
@codescholar7345 8 หลายเดือนก่อน
What CPU and motherboard? What is the temperature of the cards? Thanks!
@OminousIndustries 8 หลายเดือนก่อน
The cpu is an I7-12700K and the mobo is a MSI PRO Z690-A. I purchased them as a micro center bundle about a year ago. I have not seen the card temps get over about 75c when using the text-gen-webui. I was using keyshot pro for something and decided to use both cards to render the project and they got far too hot, so cooling is first priority to be upgraded.
@codescholar7345 8 หลายเดือนก่อน
@@OminousIndustries Okay thanks. Yeah there's not much space in that case. I have a bigger case, I'm looking to get another 3090 or 4090 and possibly water cool them. Would be nice to get an A6000 but too much right now
@OminousIndustries 8 หลายเดือนก่อน
@@codescholar7345 I have a thermaltake view 71 to swap them into when I get the time. The A6000 would be awesome but yeah that price could get you a dual 4090 setup. A water cooling setup would be very cool and a good move for these situations.
@jesusleguiza77 3 หลายเดือนก่อน ⁺¹
@@OminousIndustriesHi, pcie finally work in x8 and x8? Regards
@aronkurtyuanligsay1835 หลายเดือนก่อน
also curious on this one if the msi pro z690-a supports pcie bifurcation x8/x8?
@Meoraclee 7 หลายเดือนก่อน
Hi Ominous, Im looking for a good spec to train llm, ai (gaming sometime) with budget 3000$. Is 2x3090ti 24GB is the best option for my budget ?
@OminousIndustries 7 หลายเดือนก่อน ⁺¹
I believe the consensus is still that dual 24gb cards (like the 3090) is the best "budget move", however, I would head over to: www.reddit.com/r/LocalLLaMA/ and browse/ask there as there are a lot of knowledgeable people in there who can provide good insight on this. For what it's worth, I don't believe it will make a huge difference if you get non Ti cards, I just bought them as I wanted to match the first card I had which happened to be a Ti.
@buzzbang9164 หลายเดือนก่อน
so, any trouble ?
@OminousIndustries หลายเดือนก่อน ⁺¹
Still going strong, the cards are now water cooled as they were getting too hot being that close together for certain tasks.
@skrebneveugene5918 7 หลายเดือนก่อน
What about llama3?
@OminousIndustries 7 หลายเดือนก่อน
I tested a small version of it in one of my more recent videos!
@MrI8igmac หลายเดือนก่อน
A cooler master cpu cooler is proper.
@OminousIndustries หลายเดือนก่อน
It definitely is, had to keep it simple this time, though!
@dinsya4906 6 หลายเดือนก่อน
next: make aquarian
@Jasonlifts 4 หลายเดือนก่อน
u should totaly let me buy ur build
@OminousIndustries 4 หลายเดือนก่อน
I'm in the middle of building a custom loop and I'm about ready to toss it out the window so maybe hahaha
@Jasonlifts 4 หลายเดือนก่อน
@@OminousIndustries dude ong id buy it
@OminousIndustries 4 หลายเดือนก่อน
@@Jasonlifts I finished it and am happy with it again lol
@m0ntreeaL 7 หลายเดือนก่อน
BIG Price ...i guess 200bucks to High
@braeder 2 หลายเดือนก่อน
I think your build is slightly under sized for that model..
@OminousIndustries 2 หลายเดือนก่อน ⁺¹
You mean for the physical components or for a 70B model? It runs q4 exl2 70B quants of llama3 very well and at a decent speed.
@braeder 2 หลายเดือนก่อน
@@OminousIndustries Well I am jealous! I want ot build a rig. I thought you needed at least 140 gigs of Vram for a 70b model..
@OminousIndustries 2 หลายเดือนก่อน ⁺²
@@braeder It's a lot of fun to have. Not necessarily. Quantization essentially removes some of the "precision" of the model but in turn allows it to be a much more manageable size so it can be run on less horsepower. There are tons of quants of many models on huggingface so it's pretty good pickings to find something you like that will fit on a specific setup.
@braeder 2 หลายเดือนก่อน
@@OminousIndustries I see! I am running a 30b on my cpu right now... Taking for ever! I want a script that can off load some calculations onto my gpu for a program I am writing. But Building a GPU server will really open up some possibilities!
I was thinking of used k80's
@OminousIndustries 2 หลายเดือนก่อน
@@braeder Yes CPU will struggle due to the way the models are run, the older tesla cards are good, but there could be potential compatibility issues for some libraries and things like that, though I can't say for certain without experience with those cards. I think having a couple 12gb 3060's would open up a lot of possibilities as they will allow you to use most common current libraries and such. For navigating this space this subreddit is a really good resource to find info on good setups and other related things: www.reddit.com/r/LocalLLaMA/

ต่อไป

เล่นอัตโนมัติ

Ai Server Hardware Tips, Tricks and Takeaways