Honestly it's so great seeing someone as nerdy as me lol This looks like a crazy build, and giving me ideas for my own server I'm trying to make. Great video!
Looks and runs amazing. I love the "RETRO" case and fan controller. Total Sleeper build. Im excited to see what you have your agents doing. Amazing build.
I totally agree, these things are far exceeding my expectations tbh, I may have chosen to use a bunch of these instead of the 3090's had I first purchased a 3060
You had the exact same idea as me 😂 I was searching to see if anybody had benchmarked running two 3060's for local LLM's. I'm getting ready to build a similar setup for my home server.
I am not entirely sure tbh. I suppose physically it would be possible if the cards didn't physically hit one another, but it personally wouldn't bother putting more than 2 in a non threadripper/epyc/xeon setup
No problem! It's best to research the pairing you would like to use, but from everything I have seen, no, they can be different models of the same card or even different cards all together.
@@dpno Yes, 2x4060 will work, but there are no 24gb 4060's, the only 24gb consumer cards are 3090/4090. The titan rtx exists too but idk if that is a consumer card.
Great Video, looking into the MB only the first PCie is x16? Do both the 3060 12gig run at full speed in your setup? I plan to use this for an Unraid Server with Olama running. I have not found a follow up video on this setup in your list. Look forwards to your update. Thanks
If you want to just run models on 2 cards through Ollama you should be fine without needing to worry about lane speeds. I have read that the pci lanes and bifurcation becomes more of a consideration for training, but much less so for inference.
Nice! Please excuse my noobish questions... if someone can help. Is Zotac the only 3060 that supports 2x, and was it just plug and play? I don't see a physical connection between the two cards.
Not a noobish question at all! For the workload I am using this machine for (running llms) the cards do not actually need a physical connection to one another like the SLI/NVlink/Crossfire technologies you may be familiar with. As long as the cards have a spot on the motherboard and the psu can power them, they can both be used in tandem for LLMs and some other AI workflows. There are considerations of course like making sure the cards are suitable for running LLMs but as long as they are and you have enough space on the mobo and enough wattage on the psu you are good to go!
So do you notice any speed difference on the second 3060 compared to the first one? Because that motherboard only supports one GPU via x16 CPU. The others are x4 via chipset.
When I was researching this question for a different dual gpu build, I had read that for inference it does not make a huge difference once the model is loaded. I believe if I were to be training or doing other non-inference tasks then it may become more of a consideration, but for the purposes of this build it does not seem to be a consideration.
@@OminousIndustries I’m adding a 3060 12gb to a 4lane pcie for the same reason. The Ai support is for small models and learning. The bandwidth isn’t going to be saturated by those functions. Every source I can find has the same basic opinion. It won’t be used for games or rendering. The card on the main slot will do that. I actually ordered that card day before yesterday. I was happy to find this video today.
I have read that nvlink doesn't help much unless you are using the cards to train. As I have not personally tested this, take this with a grain of salt.
Not necessarily, no. Something like Ollama, which is my use case for this system, will split the models across multiple cards, so they will still "fit" the same as a single larger card would. It will be a bit slower, but to be able to get two brand new cards for $500 that can effectively hold the same model as a 3090 is a better deal for my use case here than buying a used 3090 for 700 or so. If you were looking to run some tasks aside from llms, like offline video generation, some of the libraries don't play as nice with splitting across cards so having a single larger card is a better option in that case.
How can this set up be beneficial in programs like Stable diffusion when technically there is no difference in the output in comparison with running just one similar gpu? As far as I know you need a single gpu with higher ram to affect the output or image generation.
Truth be told I am not sure about the effectiveness of this with something like SD. My main purpose with this machine was just to be able to run LLMS across both cards which something like ollama will automatically do.
Help! I would love to add a second RTX 3060 12G card to my Windows PC. Do I have to run Linux? Will the two cards be recognized by A1111 or Forge UI? Can Flux benefit? So many questions...
It can be a very overwhelming world to dive in to hahah. I have used 2 gpus in windows for some keyshot rendering so you shouldn't need to worry about having to run linux to get them working. In terms of using them both for stable diffusion tasks, I have not personally used more than 1 card for image generation so I can't definitively answer. I do not believe it is as simple as it would be if you were to be using them both to run an LLM.
I don't believe they have the option for these cards, but am not sure. It wouldn't make a large difference for the use case of this machine which is just to run models through ollama.
Truth be told I am unsure of using multiple gpus with something like flux, I have only ever tested it on a single 3090ti. I found this which may be of some relevance to you: www.reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/
As far as I know, there aren't any backends or UI's that split Flux or other diffusion models across multiple cards, and even in the case they did, I doubt it'd work with two cards that aren't the same (if someone knows different, please let me know.) If you're serious about running Flux better, just get a 4080 or 4090 and sell the 4070.
For running llms locally using ollama which is the use case for this system, it can just split the model across the two cards, so it was a cheaper way to get 24gb of vram without spending a couple hundred more for an unknown used card.
@@OminousIndustries Wow why can't other generative AI programs like Stable diffusion build an architecture like this to get around Ngreedia's chokehold of the whole AI industry?
Should have gone with AMD. AMD really does work better, and you don't have to worry about those flawed CPU dyes, and vulnerabilities. Ryzen 9 7900x 12 core 24 thread and an under-volt result in a beast of a CPU.
I don't believe the 12th gen have the issue and going AMD would have cost me a few hundred more for a machine that doesn't really need a beefy cpu and is mostly focused on gpus.
Why do you need such a dedicated machinery for ai. You were going try to some random stuff. You could do with your own pc. Like even with a one gpu. If your doing this for work then this makes sense
Honestly it's so great seeing someone as nerdy as me lol
This looks like a crazy build, and giving me ideas for my own server I'm trying to make.
Great video!
Haha I am happy to see I am not the only one nerding out over this stuff :D Thanks and have fun with your build!
Looks and runs amazing. I love the "RETRO" case and fan controller. Total Sleeper build. Im excited to see what you have your agents doing. Amazing build.
Thanks very much, it will be doing some random tasks and perhaps some funny ones hahaha
I love your organic content. Please keep on posting these
Thanks very much for the kind words, I will indeed!
2 x 3060 12GB vs a single 24GB card is no contest on price per GB VRAM. Great LLM or Stable Diffusion machine for sure
I totally agree, these things are far exceeding my expectations tbh, I may have chosen to use a bunch of these instead of the 3090's had I first purchased a 3060
Not if you get an used 3090 for 600
@@soumyajitganguly2593 Can be risky though to get the cheapest used card from a random seller.
Sadly stable diffusion can’t use both GPU’s to increase total vram , it can use them in parallel like using swarm . For LLM it’s legit
@@soumyajitganguly2593😂keep dreaming boy
Love your videos.
Thank you very much!
Great video!
Thanks very much!
You had the exact same idea as me 😂 I was searching to see if anybody had benchmarked running two 3060's for local LLM's. I'm getting ready to build a similar setup for my home server.
Great minds think alike hahah, it's a very potent and cost effective setup considering you can still get the hardware for it brand new!
Great video, I also want to do some ML.
Thanks very much. It can be a lot of fun (and frustrating too hahaha)
The Asus Prime Z790-P board allows the installation of 4 video cards, right?
I am not entirely sure tbh. I suppose physically it would be possible if the cards didn't physically hit one another, but it personally wouldn't bother putting more than 2 in a non threadripper/epyc/xeon setup
why didnt you test a larger models? a single 3060 can run those models too
Just what I decided to test haha
That laugh 00:52 =)
Hahaha
Hi thanks for the video. Do 2 gpu need to be the same brand and specs?
No problem! It's best to research the pairing you would like to use, but from everything I have seen, no, they can be different models of the same card or even different cards all together.
@@OminousIndustries can two 4060 works ? 24gb each giving me a total of 48GB VRAM
@@dpno Yes, 2x4060 will work, but there are no 24gb 4060's, the only 24gb consumer cards are 3090/4090. The titan rtx exists too but idk if that is a consumer card.
Great Video, looking into the MB only the first PCie is x16? Do both the 3060 12gig run at full speed in your setup? I plan to use this for an Unraid Server with Olama running. I have not found a follow up video on this setup in your list. Look forwards to your update. Thanks
If you want to just run models on 2 cards through Ollama you should be fine without needing to worry about lane speeds. I have read that the pci lanes and bifurcation becomes more of a consideration for training, but much less so for inference.
Nice! Please excuse my noobish questions... if someone can help. Is Zotac the only 3060 that supports 2x, and was it just plug and play? I don't see a physical connection between the two cards.
Not a noobish question at all! For the workload I am using this machine for (running llms) the cards do not actually need a physical connection to one another like the SLI/NVlink/Crossfire technologies you may be familiar with. As long as the cards have a spot on the motherboard and the psu can power them, they can both be used in tandem for LLMs and some other AI workflows. There are considerations of course like making sure the cards are suitable for running LLMs but as long as they are and you have enough space on the mobo and enough wattage on the psu you are good to go!
So do you notice any speed difference on the second 3060 compared to the first one? Because that motherboard only supports one GPU via x16 CPU. The others are x4 via chipset.
When I was researching this question for a different dual gpu build, I had read that for inference it does not make a huge difference once the model is loaded. I believe if I were to be training or doing other non-inference tasks then it may become more of a consideration, but for the purposes of this build it does not seem to be a consideration.
@@OminousIndustries
I’m adding a 3060 12gb to a 4lane pcie for the same reason. The Ai support is for small models and learning. The bandwidth isn’t going to be saturated by those functions. Every source I can find has the same basic opinion. It won’t be used for games or rendering. The card on the main slot will do that.
I actually ordered that card day before yesterday. I was happy to find this video today.
@@blisterbill8477 Apologies as I just saw this response! Good luck with the build and enjoy!
whats the largest model that you can fit on it?
While maintaining a decent level of performance a high 20 to mid 30b param model would be a good choice.
Im guessing 3060 doesnt have nvlink. I want another strix 3090 with nvlink 😅
I have read that nvlink doesn't help much unless you are using the cards to train. As I have not personally tested this, take this with a grain of salt.
a single 3090 is way better than this tho, right ? because 2x12gb doesn't equal 24gb for running AI if i'm not wrong
But the price tho
@@Elbis01 I bought a used one for 600 euros a while ago, but now it seems like people are asking for 800 or more. The AI boom is really in full swing.
Not necessarily, no. Something like Ollama, which is my use case for this system, will split the models across multiple cards, so they will still "fit" the same as a single larger card would. It will be a bit slower, but to be able to get two brand new cards for $500 that can effectively hold the same model as a 3090 is a better deal for my use case here than buying a used 3090 for 700 or so. If you were looking to run some tasks aside from llms, like offline video generation, some of the libraries don't play as nice with splitting across cards so having a single larger card is a better option in that case.
Do you need a higher watt power supply by having 2 gpus?
Yes you do.
How can this set up be beneficial in programs like Stable diffusion when technically there is no difference in the output in comparison with running just one similar gpu? As far as I know you need a single gpu with higher ram to affect the output or image generation.
Truth be told I am not sure about the effectiveness of this with something like SD. My main purpose with this machine was just to be able to run LLMS across both cards which something like ollama will automatically do.
@@OminousIndustries Learned something new. I thought that all if not most AI programs or processes is similar to how gpus are put in use like in SD.
Help! I would love to add a second RTX 3060 12G card to my Windows PC. Do I have to run Linux? Will the two cards be recognized by A1111 or Forge UI? Can Flux benefit? So many questions...
It can be a very overwhelming world to dive in to hahah. I have used 2 gpus in windows for some keyshot rendering so you shouldn't need to worry about having to run linux to get them working. In terms of using them both for stable diffusion tasks, I have not personally used more than 1 card for image generation so I can't definitively answer. I do not believe it is as simple as it would be if you were to be using them both to run an LLM.
I plan on doing the same build. Already have one card. Can you run llm larger then 12 gigs?
It's an awesome build for llms tbh. Yes you should be able to, I have a friend who showed me him running qwen2.5 32b on 2x 3060 12gb which is awesome.
@OminousIndustries exactly my goal! 32b is good but i need a second card
@@MrI8igmac That will be an awesome setup then!
@OminousIndustries im getting second 3060 now 😊
Dude. Im running two rtx 3060's on a b550 mb, ryzen 3700 8 core, 650 watt power supply. Qwen coder 32b is crazy fast😊
No bridge for the 3060s sli???
I don't believe they have the option for these cards, but am not sure. It wouldn't make a large difference for the use case of this machine which is just to run models through ollama.
@OminousIndustries oh ok makes sense 👍
I just got a pc with a 4070, would adding 3060 12gb be compatible with flux 1 dev?
Truth be told I am unsure of using multiple gpus with something like flux, I have only ever tested it on a single 3090ti. I found this which may be of some relevance to you: www.reddit.com/r/StableDiffusion/comments/1el79h3/flux_can_be_run_on_a_multigpu_configuration/
As far as I know, there aren't any backends or UI's that split Flux or other diffusion models across multiple cards, and even in the case they did, I doubt it'd work with two cards that aren't the same (if someone knows different, please let me know.) If you're serious about running Flux better, just get a 4080 or 4090 and sell the 4070.
@ oh that makes sense, I’ll look into it more and see if I find something. Sick video though keep it up!
@@rik1627 Thanks very much, good luck with it!
@@xilixschnell
The RTX 3060 12GB doesn't have SLI so you can't combine the VRAM. The 3090 is a killer deal.
For running llms locally using ollama which is the use case for this system, it can just split the model across the two cards, so it was a cheaper way to get 24gb of vram without spending a couple hundred more for an unknown used card.
I think ollama doesn't support distributed processing right now
@@OminousIndustries Wow why can't other generative AI programs like Stable diffusion build an architecture like this to get around Ngreedia's chokehold of the whole AI industry?
@@OminousIndustries Your demo showed that the model ran on one card only. You'd have to run a larger model to test your split-model theory.
Should have gone with AMD. AMD really does work better, and you don't have to worry about those flawed CPU dyes, and vulnerabilities. Ryzen 9 7900x 12 core 24 thread and an under-volt result in a beast of a CPU.
I don't believe the 12th gen have the issue and going AMD would have cost me a few hundred more for a machine that doesn't really need a beefy cpu and is mostly focused on gpus.
Why do you need such a dedicated machinery for ai. You were going try to some random stuff. You could do with your own pc. Like even with a one gpu. If your doing this for work then this makes sense
This machine was purpose build for some tasks that require this sort of setup, with the dual gpus, etc.