INSANE Ollama AI Home Server - Quad 3090 Hardware Build, Costs, Tips and Tricks

Digital Spaceport

มุมมอง 68 804

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 25 ต.ค. 2024

ความคิดเห็น • 188

@TuanAnhLe-ef9yk หลายเดือนก่อน ⁺³
Do you have any recommendations for an air cooler for a CPU?
@DigitalSpaceport หลายเดือนก่อน
what is the specific CPU part number?
@TuanAnhLe-ef9yk หลายเดือนก่อน ⁺¹
@@DigitalSpaceport I’m going to purchase the 7702p CPU, just like your specifications.
@DigitalSpaceport หลายเดือนก่อน
I use the Noctua SP3 on another epyc Rome system, a 7B12, this is a very tall but should just barely clear on the GPU rack from a measurement I just did. Like 10mm close. geni.us/NoctuaSP3_CPUCooler
Alternatively the D9 also fits SP3 and is lower profile at 92mm and It is featured in this video on my 7995wx. I like it but its noticably louder under load, but the 7995wx is the hottest chip to my knowledge. If it ramps full blast fans, its not quiet at all. th-cam.com/video/YQZ2HWonnGA/w-d-xo.htmlsi=d5CtIYrhjHF0D5AB
@TuanAnhLe-ef9yk หลายเดือนก่อน
@@DigitalSpaceport Thank you for clarifying. My expectation was for the air cooler to be quiet.
@thomastoseland7113 9 วันที่ผ่านมา
@@TuanAnhLe-ef9yk To be honest I think it's crazy, I used to use quad GPUs from 2 cards, but that was difficult enough with power and cooling.
I'm not even sure that 4 rtx 3090s will scale well on machine learning.
@johnbell1810 หลายเดือนก่อน ⁺¹⁴
The budget of 5K for that build is awesome, at first i thought we were looking around 10K.
@johnbell1810 หลายเดือนก่อน ⁺²
On a side note, i am having trouble sourcing used 3090 24GB in my area.
@DigitalSpaceport หลายเดือนก่อน ⁺²
Thanks I also think its very cheap for what it is. I couldnt get as much bang for the buck any other way.
@DigitalSpaceport หลายเดือนก่อน ⁺¹
Im in a fairly large metro and my local sellers want more then ebay. I pointed this out to one who had a similar 3090 Ventus for about 75 more thinking they would be like okay ill go to 600 and they refused to match ebay. Totally their choice but local sellers being way high on prices and unreasonable in negotiating is a recurring theme recently.
@FabricioMTL 23 วันที่ผ่านมา ⁺¹
What about electricity bill
@DigitalSpaceport 4 วันที่ผ่านมา ⁺¹
About 275/mo. We bought this house with electric rates in mind also as everyone doing any form of HPC should in my opinion. It can go as high as 800/mo if Im really cranking flops but thats a fraction of what cloud costs would be. There is a not small part of this that is production for my business.
@paulumz หลายเดือนก่อน ⁺²
LOL when you moved the camera from your 200W " reasonable power draw" on the rig to that insane server rack probably drawing several kilowatts. Nice video!
@DigitalSpaceport หลายเดือนก่อน
You do have a point there 😅
@Drkayb 2 หลายเดือนก่อน ⁺³
Good stuff, man. Looking forward to what the performance will be like.
@samihenrique 2 หลายเดือนก่อน ⁺⁴
This is exactly what I want to watch!!!!!!
@DigitalSpaceport 2 หลายเดือนก่อน
Sweet! Glad this build floated your boat next build video is the smaller guy, filming that now.
@arturschuch 2 หลายเดือนก่อน ⁺⁹
Really cool build, I also have a 4 gpu rig.
The only thing that I would recomend is trying to give some space between the GPUs as maximum as possible, because they being close to each other will generate a lot of heat, the difference is enourmous.
I would also add extra coolers to the GPUs, I personally like the maxmium of 1 gpu per 120mm cooler , and the coolers blowing ar direct to the GPU.
I'm not sure if a watter cooler is a good Idea here, I'm saying that because no server uses watter cooling, neither CPU miners (people using the cpu on 100% 24/7), because watter coller tends to stop refrigerenting at some point, and it doesn't have the best efficiency. I'm not sure also if a 120mm cooler will fit on your build, I'm just giving food for tought
@DigitalSpaceport 2 หลายเดือนก่อน ⁺³
Yeah the water cooling is working fantastic at keeping the CPU cool under workloads for really large models, but Im fast becoming annoyed badly those. Too dang slow. The 3090 non ti also generates a lot of heat on the back. Im working on a placements to help with heat load also. Im likely to have a video on that also at some point. Ive got vornado knockoff right now and it hits the board and keeps the nvmes cool, but redoing the whole layout of the mini datacenter is very likely.
@JoeVSvolcano 2 หลายเดือนก่อน ⁺²³
WoW, lovin this build! I built my LLama3 on my ProxMox with a single 3090RTX Passthrough. It gets pretty hot, I can only imagine what kind of heat load your pumping into that room..
@DigitalSpaceport 2 หลายเดือนก่อน ⁺⁴
I have active cooling but yes that has limits. I have some ideas on placement to hopefully help I'll be testing in the next video. Rearranging the area a bit now.
@ewenchan1239 2 หลายเดือนก่อน ⁺³
It depends on your workload and whether you're running a model or training a model.
If you're just running a model, unless you're constantly peppering it with requests, it might not get all that hot.
(I'm running a dual 3090 setup with the open-webui, which uses the Ollama backend, running the Codestral 22b model, and it only spins up when I ask it something or type a response back, but then in between that, the GPU sits at idle.)
If you're TRAINING a model, then that's a different story.
@jksoftware1 หลายเดือนก่อน ⁺³
I have 2 RTX 3090's in a AMD 5900x system and they thermal throttle because of the spacing. Once I get the PCIE extension cables and install them in a new case that should solve the problem.
@DigitalSpaceport หลายเดือนก่อน
@@jksoftware1 yeah the 3090 is a really fat card. I toyed with the idea of water cooling but it was much cheaper to just use the risers. The risers are still too much imo also.
@ewenchan1239 หลายเดือนก่อน ⁺¹
@@jksoftware1
I have two 3090s in an Asus Z170-E motherboard with a 6700K and 64 GB of RAM.
The two slots drop down to a x8/x8 configuration, so it makes it more difficult to push a hard enough load onto the GPU to get it to thermal throttle.
(And that's with running both InvokeAI and open-webui with the codestral:22b LLM models simultaneously.)
If I need to space them out, I can use some of the GPU mining hardware that I still have. The PCIe x1 link back to the motherboard won't be great for bandwidth, but it will provide ample spacing for cooling.
@taxplum4858 หลายเดือนก่อน ⁺⁴
Nice build! Have you ran many training workloads on it?
The single core perf of the 7702, even with boost , is pretty mediocre. I fear it would bottleneck training unless you spend a bunch of time optimizing data loading code. I went with a threadripper pro for my 4x3090 for this reason but always wondered how a 7702 would preform.
@bekappa488 2 หลายเดือนก่อน ⁺⁵
that odd fan is making me go crazy LOL
@DigitalSpaceport 2 หลายเดือนก่อน
I know but I spent all my money on GPUs and Pads
@whosestone 7 วันที่ผ่านมา
Sweet.
Gonna build this, my old school dual C2070 is now a dinosaur.
@DigitalSpaceport 6 วันที่ผ่านมา
I had to look that GPU up. Does Fermi still work for ollama?
@thebasicmaterialsproject 3 ชั่วโมงที่ผ่านมา
cool build, love the spreadsheets. You should project cost of ownership. At what point does it become to expensive to own ? Is there perhaps an undervoltage potential to bring power draw down without adversely affecting performance to much ?
How can you make money out of it ? Can you bridge the gap from your website to this awesome beast and share time against your AI ? Is there a particular branch of AI that is more cost effective against another that might not be landing on the sweet spot ? Could you add an ASUS raid card, perhaps another network card , say 10gbe , can processes be re-routed to avoid cpu and ram bottlenecks ?
Good job, want more.
@isbestlizard 2 หลายเดือนก่อน ⁺²
Nice one! I build a 8xA4000 epyc server which was...epic! 128GB vram
@DigitalSpaceport 2 หลายเดือนก่อน ⁺³
A4000 have such a nice single slot format and 16GB vram it's a great card. What is your biggest models you like to run in that 8x ? I have 1 A4000 and 2 A5000 but thinking of selling those for more 4090s.
@B_r_u_c_e 7 วันที่ผ่านมา
Thank you. Looking forward to thermal paste report.
@DigitalSpaceport 5 วันที่ผ่านมา
Coming soon!
@joshwilson8501 2 หลายเดือนก่อน ⁺³
Nothing wrong with zip ties. Sweet rig!
@kr00tman หลายเดือนก่อน ⁺¹
When it comes to tabs, I am your wife to a T lolol, thanks for the shoutout. Loved the video!
@bishop838 2 หลายเดือนก่อน ⁺⁴
Powering each 3090 with a single 8 pin ala piggyback connector? I thought 8 pin standards had a max wattage of 150w and even though your going to use afterburning (or equivelent) to reduce, you stated anticipated 275w. I just finished a chia miner 4x 4090 setup and I'm going to have to use afterburner to reduce the power as three 4090's on a single Asus ROG Thor 1600 trips the internal breaker and shuts off the PSU. Will be interesting to see how your four 3090's and the Corsair 1500 handle similar with additional draw from an Epyc processor/motherboard combo. May need to add a second to perform as desired. Thumbs up! - I see the PHISON drives also, nice touch.
@DigitalSpaceport 2 หลายเดือนก่อน ⁺⁴
Okay im going to research more on that here. Im not aware of it, and none of the cables are warm just touched um under load for about an hour. Still a good thing to know.
@hienngo6730 2 หลายเดือนก่อน ⁺³
@@DigitalSpaceport, @bishop838 is correct. 150 W per cable for the PCIe connector is the official spec. You also get 75 W from the PCIe MB connector. Some of my power supplies list 200 W per cable max (even with two connectors), so if you can limit your GPU power to ~225 - 275 W, you'll be under the limit. If you're just running LLMs doing inference or even Stable Diffusion/Flux image generation, though, you should be fine even with the current setup. Unless you're doing training or fine-tuning that runs the GPUs at 100% continuously, you're unlikely to trip any breakers or brownout your power supply.
@treniotajuodvarnis5503 2 หลายเดือนก่อน ⁺²¹
Why limit gpus' TDP?!? Just add another PSU! 4x3090 is 1400w already! 512ram and 7702 cpu is another 500w, so one more PSU, 750w minimum, it costs nothing compared to the price of the system. And with 1500w you don't want to run it on the max limits, keep 20% reserve, If you want stable reliable system, your gpus has to be limited to 150w instead of default 350, that's a huge hit!
@DigitalSpaceport 2 หลายเดือนก่อน ⁺⁷
I likely move it into the server racks at some point, for power draw reasons. Im also likely to get a DC powerboard for it then but I've got a lot of rearranging gear to tackle before it gets to that point. The CPU/ram doesn't cross 250 from what I've seen on my 7B12/H12 combo unless im running the cpu hard which this type workload doesn't seem to so far. I also don't trust the 2psu adapter kits, have known a guy who mined eth hard and had several burn up
@treniotajuodvarnis5503 2 หลายเดือนก่อน ⁺³
@@DigitalSpaceport Llama uses cpu at first to compile and ram, then gpus as of my observations and I asked llama how it(he/she) works and got the reply confirming it :)
@ukrainian333 หลายเดือนก่อน ⁺²
Very good point, btw
@claxvii177th6 14 วันที่ผ่านมา
@DigitalSpaceport also, for llms, the peak performance is less important. Tha vram is what is golden about that setup right? I limit the power of my 3090 on my grid, just so they don't reach the highest temps in my fairly small case(i am at my financial limits, 800 dolars of used components is what i could afford)
@themarksmith หลายเดือนก่อน
Great video - subbed!
@DigitalSpaceport หลายเดือนก่อน
Welcome to the channel!
@thanadeehong921 5 ชั่วโมงที่ผ่านมา
Your video is great!
There is a resale Asus prime trx40 pro at pretty decent price. Do you think it would do the job as good? The mb comes with 3 x pcie 4.0.x16 and may need a splitter to accommodate 4 gpu.
In addition, I already have 6 x rtx3090. Do you think it would be beneficial to utilize all the 6 GPUs? I plan to go for llama 3.1 70b.
@DigitalSpaceport 43 นาทีที่ผ่านมา
What is it priced at and does it include CPU?
@int_pro 20 ชั่วโมงที่ผ่านมา
Any video of it running the big Llama 3 model?
@MichaelAsgian 3 วันที่ผ่านมา
I'm curious whether it's better to have identical 3090s or if using different brands wouldn't make a difference. Also, is it possible to mix 3090s with 4090s?
@DigitalSpaceport 2 วันที่ผ่านมา
Yes you can mix any GPUs, even extremes like a 1070 and a 4090 and benefit from added vram size, however if you have slower cuda cores (usually older gen is slower) then you will baseline performance at the lowest cards level. Mixing 3090 and 4090 for quant 8 or lower is next to no discernable difference. Mixing 30/40 at fp16 you will have several t/s slowdown. I like keeping it to one model of 3090 if you can as you likely need to clean/repad 3090s especially. Its easier to be able to know what screws and pads go in which spots that way.
@LucasAlves-bs7pf 2 หลายเดือนก่อน ⁺⁵
Does it work to mix diferent generation of gpus like rtx 30 and rtx 40? Well job!
@dude2093 2 หลายเดือนก่อน ⁺³
Yes
@DigitalSpaceport 2 หลายเดือนก่อน ⁺³
Yes for inference, like running premade models, you can mix GPUs and pcie BW is not that important. Im going to test mixing in some other various other GPUs to test how this impacts that performance. For training you want to be as close to each other as possible.
@tpadilha84 10 วันที่ผ่านมา
A good alternative is buying a refurbished Mac Studio with M1 ultra + 128GB RAM on e-bay for around $3k. The M1 ultra with 128GB RAM will run 70b models with q8 precision at ~7.5 tokens/second and draws less than 100W when running such models. Additionally, you can configure it to allow up to 120GB RAM for the GPU, which should be enough to run 70b models at 64k token context.
@DigitalSpaceport 10 วันที่ผ่านมา
I have one big issue with the Mac Studio route and that is indeed the Tokens/second fall into what I deem an unusable range for middle size models. Under 10 is painful and discouraging to use imo.
@tpadilha84 10 วันที่ผ่านมา
@@DigitalSpaceport 7 tokens/second is slightly above the speed at which I can read and get good comprehension of what I'm reading, so anything above that speed doesn't make much difference for me when using the model on a chat UI. However, for using the model as an agent for automating tasks, then yes this speed is very low.
One thing I'm curious is what kind of speeds you get when using larger contexts with quad rtx 3090 setup. On the M1 Ultra it gets very slow for 70b models at close to 30k tokens in context, about 2-3 tokens/second
@johndelabretonne2373 4 วันที่ผ่านมา
When you will be paying roughly $2400 for a 32 GB 5090 and most likely $1200+ on a 16 GB 5080, I would expect the 4090's to be selling for at least $1400+. The 3090 will probably continue to be the best bet in town!
@DigitalSpaceport 3 วันที่ผ่านมา
Im likely selling my 4090s in anticipation of the 5090s launch. Going to campout or whatever it takes to get one when they launch. The 3090s just to a great job so they get to stay. 3060 12GB on way currently lol. I *may* have a GPU problem
@RyanKnowsTechStuff 5 วันที่ผ่านมา
I have a epyc 7551p 256gb with Tesla p4. Going to potentially put my 3090 in it. The 7551p also does 2ghz for 32 cores. Do you find 64 cores at 2ghz is working well or is the cpu ghz speed a bottleneck regardless of core count?
@dude2093 2 หลายเดือนก่อน ⁺⁹
No Ollama demo?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺⁷
Yeah I'm separating the hardware videos, software install/config videos, and benchmarking videos. Those will both be out very soon.
@TheSasquatchjones 2 หลายเดือนก่อน
Loving this content
@Angel24112411 หลายเดือนก่อน ⁺¹
I miss where Ollama training has been shown and how you tell it to divide itself among 4 GPUs. Can you fit in these 4 GPUs a 70GB model, for example something in FP16 with ~30 bil.params ?
@DigitalSpaceport หลายเดือนก่อน
The user does not tell it to split into layers, the underlying parallelism method is automagically applied from llama.cpp which powers ollama. You cannot fit the llama3.1-70b-instruct FP16 into just 96GB vram. That takes 140GB ollama.com/library/llama3.1/tags but you can fit in the q8.
@MrDenisJoshua วันที่ผ่านมา
Do you think is possible to make a server like this, but add a file server also ?
I mean, I want to make a server that I can use like NAS, Emby/Plex server and IA...
I want to use maybe Proxmox and share the GPU fol all this servers...
Is this possible please ?
Thanks a lot
@AdminUser-k1x 5 วันที่ผ่านมา
Thank you for amazing content.
I bought the same setup using your links. I am having a hard time to understand where to plug the power switch. How are you turning on/off yours? Is there any spot on the motherboard I can plug the switch on?
@DigitalSpaceport 5 วันที่ผ่านมา ⁺¹
Find the model number online. Then search that document for PWR and it will show you the header position pins. The black cable - is ground and the red is +
@MrButuz 2 หลายเดือนก่อน ⁺³
Looks cool. I prefer founders editions so pretty and such high build quality. Oh by the way your MOBO&EATX connectors didnt look pushed in properly?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
I love the FE editions also they are works of art. I have FE 3070 and 3080ti but noticed on used markets the 3090 FE is not just slightly more expensive. Good eye! I just went and seated it fully.
@TheYoutubes-f1s 2 หลายเดือนก่อน ⁺²
Would love to see a Geekbench result for this machine.
@DigitalSpaceport 2 หลายเดือนก่อน
Geekbench? If that can run in ubuntu 22 ill toss it into the benchmarking video.
@quercus3290 หลายเดือนก่อน
you could have mounted the cpu radiator on the shelf below? level with the GPU's, maybe help take the strain of that one hose. Dude, 12:34 what are you into lol, thats some set up.
@DigitalSpaceport หลายเดือนก่อน
Yeah I want to fabricate an entire new case. This is just not optimal. In a nutshell I have a backup of the usgs geotiffs and I do a geospatial rendering based workload for my business. Its now able to be done with GPUs faster at nearly the same quality as CPUs so those r930s are not really needed as much.
@quercus3290 หลายเดือนก่อน
@@DigitalSpaceport cool stuff, just recently watched a tutorial, VAPOR for WRF-Fire. Im started to learn a bit about visualization with matplotlib, mostly on dataset embedding and query returns.
@husratmehmood2629 29 วันที่ผ่านมา
Hi dear ,awesome totally, how can I build a server to get performance equal to AMD Ryzen 7995X Threadripper Pro , with RTX 4090, and 128 GB 6400MHz Ram with Pcei 5.0 NVMe? I am doing research on building a Server for training my AI &ML models I considered AWS but its very costly so I am considering my own Server
@mams4480 หลายเดือนก่อน ⁺¹
Hey, what are your thoughts on mix-match GPUs? (i.e., dual 3090s and RTX ADA 4000/4500). Are there any benefit or disadvantage in mix-match or all same GPUs?
@DigitalSpaceport หลายเดือนก่อน
Good question! I'm not sure I guess I will test that out here actually. I would guess that core speed/ram speed will dictate a lowest common denominator outcome as all work pieces need to be completed before response. I am also curious about scaling up VRAM in non-homogenous routes and the impact that has. I think layers are propagated in ollama/openwebui intelligently to each GPU based on capacity in VRAM. Im going to check, this is an important question. Thanks for asking!
@ArouzedLamp 2 หลายเดือนก่อน ⁺⁵
Explain to me like I'm knew to this.
Why would you want to run an AI server? What applications would this enable, and is it actually any better than building a server with more consumer kinda parts? AKA 7950x or 7900x + ONE 3090
@DigitalSpaceport 2 หลายเดือนก่อน ⁺⁴
Love this question. Im going to quote it in the single GPU video which will fully answer the why part. Of note is speed for inference (processing) requests to the system and models landing inside VRAM is of course ideal to the tune of like 10x speedups. Thats a major reason. Several other big ones exist as well.
@mastermoarman 2 หลายเดือนก่อน ⁺¹
You should get hailo to sponsor a video with their 8 or 10h m.2 module
Also howmany tops is this setup?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
Im open to free storage gear. Like VERY OPEN lol.
@mastermoarman 2 หลายเดือนก่อน
They arnt storage. They are ai compute modules. 26 and 40 tosp of comput at less then 5w.
@DigitalSpaceport 2 หลายเดือนก่อน
Im technically very open to all gear lol. Ill have more in the benchmark video on all the stats, but the 70b so far is looking good on tok/s at 17.7 and 98 for 8b llama3.1
@Krath1988 7 วันที่ผ่านมา
Can you do a video on the software setup?
@DigitalSpaceport 6 วันที่ผ่านมา ⁺¹
Here you go th-cam.com/video/TmNSDkjDTOs/w-d-xo.html
@Krath1988 6 วันที่ผ่านมา
@@DigitalSpaceport Nice! Thank you.
@HKashaf หลายเดือนก่อน
I don’t understand something with this setup aren’t you limited to just small LLMs. Mainly because only 2 RTX 3090 can sync together via NVLink so you essentially have 2 sets of pair of RTX with you four cards.
Also, I wondering about PCIe bottleneck.
Lastly, would advise to get a big enough RAM to load the entire 300Billion parameter LLM which works out to about 1.2 Tbytes.
If you could please discuss the limitation with this setup?
@DigitalSpaceport หลายเดือนก่อน ⁺¹
No that's not a correct starting point for assumptions, but one I started with as well. Its poorly discussed but im working on talking and sharing much more of my learning about this all also. You do not use nvlink for inference. The llama.cpp runner code automatically layers the model into GPUs automatically, so no need outside highend training for nvlink. To also state, Im using no nvlink. It also can layer it into system ram as well. However there is no need to run any large parameter model off system ram, as performance is abysmal. Even on the worlds fastest CPU/RAM combo, it is unacceptably slow. Think 1tk/s. At q4 for llama3.1 405b.
@HKashaf หลายเดือนก่อน
@@DigitalSpaceport thanks
@DigitalSpaceport หลายเดือนก่อน ⁺¹
I made this video that shows this pretty well also th-cam.com/video/-heFPHKy3jY/w-d-xo.html
@TheInternalNet 2 หลายเดือนก่อน
Wow that's a super impressive build. I'm looking at doing the same gpus with the Lenovo p520 or Lenovo p920.
@DigitalSpaceport 2 หลายเดือนก่อน
Lenovo used systems price point is pretty attractive!
@LucasAlves-bs7pf 2 หลายเดือนก่อน ⁺²
Please, test mixing diferent VRAM size cards like 3090 (24 gb) and 4070 (12 gb). Can it balance the work in a way that don't crash when hit the 12 gb mark?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
Am planning on this and a few other test. Here are some cards I have on hand that I may run mixed workload testing against. 3060ti, 3070, 4090, A4000, A5000. I think the A4000 + 3070 + 2x 3090 would be a good test.
@zeusconquers หลายเดือนก่อน
great job keeping that under 5k. I made so many mistakes like dual xeon gold 6148s which didnt cost me money but time. I got it to about 5700 and it is not as good as yours.
@DigitalSpaceport หลายเดือนก่อน
6148s are pretty nice chips also though! Are you air framing it or rack case?
@stuffinfinland 19 วันที่ผ่านมา ⁺¹
These 4 GPUs should only draw 250-300W alltogether?
@DigitalSpaceport 18 วันที่ผ่านมา
Due to the way the model workload splits across the GPUs when you are using their VRAM, they are often around 25% utilization on the processors. There are other ways to split workloads but lamma.cpp is under the ollama hood so that would need to be addressed there. Tensor Parallelization is the term.
@boukeelsinghorst4848 2 หลายเดือนก่อน
How do you plan to cool your mainboard, the board is made to run in a server enclosure. In your current setup only the top of te rack with the gpu's is actively cooled. How do you plan to cool the ddr and other chips on the motherboard?
I run a h12ssl-1 motherboard with Epyc 7573x with 2 x 3090 and went with an Artic freeze u4 air cooler instead of water cooling to get that extra needed airflow inside a 4u server case. I was considering the Gigabyte motherboard but since I don't use a riser setup the top pcie slots wouldn't be usable since the gpu would be directly over the cpu socket.
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
I think the H12 is an excellent choice also! Have one myself going into a 4u case soon. Four fat 3090s wouldnt fit in the case however and this board is v cheap. I have a small vornado knockoff fan that moves a lot of air over the mobo. Thats shown in the most recent video now also.
@fragtrap0083 2 หลายเดือนก่อน ⁺¹
Have you tried renting out your hardware with vast ai or salad?
@DigitalSpaceport 2 หลายเดือนก่อน
I need better upload speeds. Cable modem has me capped at 40mbits upload but high split is on the way and should make that a viable route for idle times. I need to think about reservations and utilization more before I put this rig on it but competition is pretty high on depin, and impossible at 40mbits.
@ArielLothlorien 13 วันที่ผ่านมา
Yes but how do you get any LLM to run on all that? For example, llama v3 requires a high VRAM count. Does this get around that per card VRAM by being able to aggregate the VRAM or is that not a thing?
@DigitalSpaceport 11 วันที่ผ่านมา
Yes it spans the vram of all the cards needed to fit the model
@mawkuri5496 หลายเดือนก่อน
can you test dual asus ai accelerator card vs that quad 3090 for comparison which is much faster on running ai and training new models?
@DigitalSpaceport หลายเดือนก่อน
I dont have those cards and I dont know if they would be compatible either.
@SuperSayiyajin หลายเดือนก่อน
Thanks for review. I have Asus z10pe-d16 ws main board,2x xeon 2683 v3, 8x 16gb ddr4 2133p,5x 3090 and many corsair a 1500i PSU. Tried 70b q8 and q4 and 405b q2. They are extremely slow. What do I miss? What is 4i sff 8654?Ty
@DigitalSpaceport หลายเดือนก่อน
You checked with nvtop when running and they are hitting the gpu vram during operation? If its running slow, thats the place to start.
@8888-u6n 2 หลายเดือนก่อน
Grate video 👍 can you make a video on your system running ollama 3.1b 70b
@DigitalSpaceport 2 หลายเดือนก่อน
While this is mainly a tutorial to get open webui ollama and meta llama 3.1 setup in ubuntu, it does feature me running the 70b and while the stats I shared for a story generation may not be the same as hard logic, its pretty good. Ill have full in depth testing on 8 and 70 soon. 405 is now giving me issues, was running a few days ago.... The stats part is closer to the end. th-cam.com/video/q_cDvCq1pww/w-d-xo.html
@yaterifalimiliyoni9929 2 หลายเดือนก่อน
This is dope and extremely cost effective but it's not future proof. What happens if 2x 5090 makes it possible to run llama 4 1t
@DigitalSpaceport 2 หลายเดือนก่อน
No its not future proof at all, but I wanted to wait until we see the next nvidia GPUs before I decide on something bigger. I dont think we will see more than 24GB VRAM in the 5090 currently, and while model split is a thing and does work.... its pretty slow.
@___x__x_r___xa__x_____f______ หลายเดือนก่อน
What about running diffusion models. Can one use vlink to increase unified vram to fit big models ? Would it be possible to switch to 4090’s for extra speed?
@DigitalSpaceport หลายเดือนก่อน
Im not sure nvlink is needed now. I think with LLMs at least you can count on the layers being propagated with something like ollama automagically. Not sure about diffusers but will keep an eye on nvtop when I do that video.
@xainslik8138 2 หลายเดือนก่อน ⁺¹
Can you make a video follow up on use case
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
Yes I forgot to mention in this video I'm splitting up hardware related, software setup/config and benchmark videos. Use case definition will be covered in the software videos.
@joshhardin666 2 หลายเดือนก่อน
I'd love to do something like this, and I have some reasonable hardware to make it happen, but I straight up don't have the power. What do you use as a power source? giant solar array? my power in CT just went up to .35/kwh
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
Im on grid for power unfortunately still, but that likely changes this year. Our rate is .10/kWh and im on a co-op that does a great job controlling costs. We do have land for a ground based array onsite but trenching in limestone is expensive. Austin gets a lot sun so it likely makes good sense for us. At .35 Im not sure what I would do!
@ziozzot 2 หลายเดือนก่อน ⁺³
woud also be nice for Rendering blender scenes
@DigitalSpaceport 2 หลายเดือนก่อน
Ill add this to the benchmarks 👍
@SuperSayiyajin 2 หลายเดือนก่อน
Which ollama model did you use? What is token count? There is not any info...
@DigitalSpaceport 2 หลายเดือนก่อน
I'm using metas llama 3.1 70b and it hits between 22 and 17 tok/s. 8b hits around 95 and 405 hits around 1. Have a full video on each model coming up but this video I think I have a chapter on llama 3.1 70b you could check. th-cam.com/video/q_cDvCq1pww/w-d-xo.html
@mams4480 หลายเดือนก่อน
Any specific reason for going with the XianXian GPU rack instead of "AAAwave The Sluice V.2"?
@DigitalSpaceport หลายเดือนก่อน
Yes the price is lower on the one I have included and from what I can tell they all look like the exact same rack. So going cheap FTW.
@ShitpostingArchive 2 หลายเดือนก่อน ⁺³
can you run llama3.1:405b model on this?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺²
Okay I did get 405b to run on this. It was EXTREMELY slow however. I would class it as unusable. That was not unexpected but only 44 layers of 145 can load into VRAM on the GPUs so yeah I guess I would need ~ 12 GPUS of 24GB to run it at respectable speeds. Hit .75 TOK/S at 2048 which ended up being around 6 min generation time on easy logic.
@ShitpostingArchive 2 หลายเดือนก่อน ⁺¹
@@DigitalSpaceport thank you very much for testing if you are just limited by vram would it be feasable to run M40 instead? i have seen them on ebay go for 170€
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
Id be surprised as Maxwell generation is pretty old now. CUDA 5.2 and also pcie3. I'd not go with those cards but there may be more recent ones I should check into.
@jesusleguiza77 28 วันที่ผ่านมา
what motherboard Cheap do you recommend me for 2 rtx3090? Regards
@DigitalSpaceport 27 วันที่ผ่านมา
Inference only or do you need the ability to run them at full PCIE 16X gen 4 speeds simultaneously like with Training?
@jesusleguiza77 27 วันที่ผ่านมา
@@DigitalSpaceport both options please
@frankwong9486 2 หลายเดือนก่อน
It remind me those Miner: dejavu I have been in this place before 😂
@Beauty.and.FashionPhotographer 2 หลายเดือนก่อน
did i miss any Pricing comparisons and infos in the video?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
I didnt do direct price comparisons. I would suggest you consider the H12SSL for a mobo however. Its worth the extra imo.
@Beauty.and.FashionPhotographer 2 หลายเดือนก่อน
@@DigitalSpaceport what is th meaning of the word Mobo
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
Motherboard
@ChristianRojas 25 วันที่ผ่านมา
which ollama 3.1 model have you deployed / tested?
@DigitalSpaceport 18 วันที่ผ่านมา
all of them. Anything specific you are looking for an answer to?
@natc9 2 หลายเดือนก่อน
Do you have to use even number of GPU(4)? Will it work with 3 GPUs?
@DigitalSpaceport หลายเดือนก่อน ⁺¹
Yeah 3 will work, just remember that the VRAM is additive so you want the whole model to fit into the VRAM of the cumulative cards.
@natc9 หลายเดือนก่อน
@@DigitalSpaceport thank you very much for your reply, I'm just getting into building pc for LLM and gathering information on which gpu I should use and how multiple gpu can be beneficial
@ZaPirate 2 หลายเดือนก่อน
why didn't you go for a tower cooler? There are some decent 3U/4U options that are not loud and the performance is more than adequate. Please note that server motherboards rely on airflow over the VRM for optimal operation. You could run the risk of hitting thermal limits and cause throttling/shutdown of the system.
@DigitalSpaceport 2 หลายเดือนก่อน
Yes I have a HDX vornado ripoff mini fan that I have pointed at the mobo. It will be in the testing video. I do have tower coolers, but they are all utilized in other systems currently. This Corsair 420 I had free and I very well might be putting the 7995WX into this rack at some point for testing on the fastest platform available.
@ZaPirate 2 หลายเดือนก่อน ⁺¹
@@DigitalSpaceport if it's free, then all good. great video
@Grapheneolic หลายเดือนก่อน ⁺¹
does it matter if I purchase a 7702 over a 7702p amd epyc cpu?
@DigitalSpaceport หลายเดือนก่อน
@@Grapheneolic 7702p is fine for single socket boards. It wont allow for a second processor is the only difference.
@Grapheneolic หลายเดือนก่อน
@@DigitalSpaceport Thanks for the quick reply. So given I purchased a 7702, I could technically add a second processor if wanted to?
@DigitalSpaceport หลายเดือนก่อน ⁺¹
@Grapheneolic if you have a motherboard with a second socket, yes.
@ErikFrits หลายเดือนก่อน
how did you get 3090 so cheap ?
In Europe they are 1500 a pop.
@DigitalSpaceport หลายเดือนก่อน
They had been used by a friend for ethereum mining prior, in a harsh environment. The amount of dirt I had to clean off these was really a lot. The pads had also been destroyed. All replaced now but a lot of work.
@chirvo 14 วันที่ผ่านมา ⁺¹
I used to build systems like these to mine Ethereum
@DigitalSpaceport 11 วันที่ผ่านมา
Same rack yup with some modification to fit fullbwidth risers. Im going to work on a larger one lol next, need more gpu ha
@araa5184 23 วันที่ผ่านมา
I wonder what are the "really cool AI and other things"? Outside of maybe home AI, maybe some prompting I can't really wrap my mind around hosting a LLM. Anyone can tell me the other applications?
@DigitalSpaceport 18 วันที่ผ่านมา
Check the most recent video here for some examples of vision routing and realtime web search engine hosting. I didnt want to drag that video on longer and I am building and learning in realtime also (sharing along the way) and there are more functional use case based videos coming. I agree that part is lacking in this video, but it was only intended to showcase how to build the thing.
@sarahracing2619 2 หลายเดือนก่อน
Nice video. Work on that audio though. The voice overs sound off.
@DigitalSpaceport 2 หลายเดือนก่อน
I work and record in a harder audio environment than any other homelab youtuber I hope you consider that as well. I spent over an hour already on the audio on this and its impossible without shutting down the rack machines to get clean audio. If I was in a studio like they are I would for sure be embarrassed at the audio quality, but im in a 8 ft away from a mini datacenter. I do want to set your expectations ahead of time that this may be the audio quality I can achieve.
@simo.koivukoski 2 หลายเดือนก่อน ⁺¹
Why no NVIDIA NVLink used?
@DigitalSpaceport 2 หลายเดือนก่อน ⁺²
For 3090s im not sure it does anything for inference tasks? Does it? I have a dual A5000 with nvlink and it does enable a larger nonsharded memory size but I only know of that in the context of GIS. Also just to be clear Im pretty new to running local Ai and not trying to larp as an expert. Here learning myself also.
@mjes911 2 หลายเดือนก่อน
Tap 3 screws through the mobo?? 😮
@DigitalSpaceport 2 หลายเดือนก่อน ⁺¹
LOL oh god no. I did mount the board up and use a pencil to mark the 3 spots. Then removed the board and tapped the 3 spots. I'm not that crazy!
@mjes911 2 หลายเดือนก่อน
@@DigitalSpaceport phewww lol 😂
@FrgottenFrshness 4 วันที่ผ่านมา
there is absolutely no way you are going to see any kind of condensation unless the room is at below freezing temperatures or you are using liquid nitrogen why even mention condensation???
@DigitalSpaceport 4 วันที่ผ่านมา
The window AC spits out sub 32F air and wqs one, of many, considered plans was to have heatsinks right next to it. I opted not to and everything is greatly cooled from a distance as well. I do see condensation at times on the AC directional fins and need to wipe it off and pay attention to it so I dont get mold growth.
@davidunderwood9037 2 หลายเดือนก่อน ⁺¹
But, will it mine (Bitcoin)?
@DigitalSpaceport 2 หลายเดือนก่อน
That it cannot
@davidunderwood9037 2 หลายเดือนก่อน ⁺¹
@@DigitalSpaceport
@FSK1138 2 หลายเดือนก่อน
$500 challenge
@int_pro 20 ชั่วโมงที่ผ่านมา
RIP your power bill. 😢
@DigitalSpaceport 36 นาทีที่ผ่านมา
Its not that bad. When we bought the house we made sure to go outside city owned utility to a much cheaper COOP. Under 300/mo for the whole house in central texas is very decent
@squirrel6687 2 หลายเดือนก่อน
Unlike gaming, AI and machine learning really do not benefit from 16 vs 8x lanes. That is because models are loaded once. Once the model or models are loaded into VRAM, the CPU has a minimal effect. Now, if you are pooling VRAM with NVLink, it is so much faster than PCIe 3.0, 4.0, or even 5.0 by a long shot. Also, though I have U.2 access with both the Z590 Dark and Z690 Dark Kingpin, they pale in comparison to the speeds of native PCIe 3.0 and 4.0 NVMe.
I, too, have that same chassis from mining but have always wondered how it would perform as an AI frame--just haven't gotten around to tinkering. At 3:20 I've stopped, because experience gained from the last year of the Etherium GPU mining boom to now is sufficient, and for me, I doubt there is any real new value.
@DigitalSpaceport 2 หลายเดือนก่อน
Oh Im doing training also but yeah full lanes are not needed for inference. I did mention that. Can I nvlink the 3090? Ive read its minimal return recently. I guess the channel isnt for you, no harm at all there lol.
@FrostDagger 13 วันที่ผ่านมา
How to build a ai girlfriend
@DigitalSpaceport 11 วันที่ผ่านมา ⁺¹
Okay just for you, im gonna try to make one. Wife might end me though 😆
@大支爺 หลายเดือนก่อน
2x 4090 is better than 4x 3090 by all means.
@DigitalSpaceport หลายเดือนก่อน
except for total VRAM amount but I do agree also as an owner of 2 4090s
@rob8823 2 หลายเดือนก่อน
Will i be good at fortnite finally?
@DigitalSpaceport 2 หลายเดือนก่อน
That game is impossible. There is always a tween on a cell phone that is faster!
@winsucker7755 2 หลายเดือนก่อน ⁺¹
Watching this video with $100 on my account :|

ต่อไป

เล่นอัตโนมัติ

Ollama AI Home Server ULTIMATE Setup Guide