AI Data Center Tour A Hardware Playground

ServeTheHome

มุมมอง 135 176

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 28 พ.ย. 2024

ความคิดเห็น • 96

@jolness1 ปีที่แล้ว ⁺²³
I love that you guys are so upfront about presenting anything that could remotely come close to “sponsored content”. Makes me have even more trust in STH. I wouldn’t blame you for not mentioning it because of course you need intel’s permission and maybe they paid for flights or whatever. But right away you guys mention it just so there is no ambiguity. I know that it’s still editorially independent though which is great.
Appreciate the work you all do and Patrick’s infectious enthusiasm
@ServeTheHomeVideo ปีที่แล้ว ⁺⁴
Always like to explain what is happening.
@LaweHawre-lr3ju 11 หลายเดือนก่อน
رئر
@jonathanbuzzard1376 ปีที่แล้ว ⁺²³
5kW a server is peasy. The problem is with the 10kW H100 based DGX servers. The coling is not the problem either, it is getting the power into the rack
@ServeTheHomeVideo ปีที่แล้ว ⁺¹⁶
We are going to have a demo on the STH main site from OCP Summit 2023 where they are now liquid cooling the power delivery to help power AI servers
@jonathanbuzzard1376 ปีที่แล้ว
@@ServeTheHomeVideo I don't think you get it. What follows is our *real* experience of trying get ready to handle these servers, some of which are already onsite. We currently can only accommodate a single H100 DGX box in a rack and the rest of the rack is more or less empty with some test boxes we don't care about. The reason is that if even a single PSU fails on a H100 DGX box it will cause a cascade failure of the rack as all the breakers trip.
First off liquid cooling is at this juncture a waste of time. So let's look at the Nvidia H100 DGX server. It's 5U in height and 10.8kW power draw using six C19 leads. I can therefore squeeze a maximum of 8 of these in a 42U rack. That would require ~87kW of cooling. The 12 year old water cooled rear doors on my racks are good for 93kW of cooling so I am already good to go with my cooling and have been for over a decade.
The problem is the power. So I upgrade the power delivery to the racks (this is underway) to get three 32A 3-phase supplies per rack. Being in Europe that gives me ~66kW of electrical power in the rack. This is going to be worse in North America with your anaemic electrical system. Immediately I now have to drop the number of servers per rack down to six because I can't power them. Not great but it is acceptable. However, it gets worse as I need six C19 outlets per server. The *vast* majority of zero U PDU's only have six C19 outlets so now I am down to three servers per rack. I did find one with 12 C19 outlets per PDU but on further investigation, it turns out some tuna melt paired the C19 outlets up and put each pair behind a 20A breaker so less useful than a chocolate tea pot which at least you can eat. Now I am down to getting custom PDU's made but they are dumb and don't come with power monitoring. It all really really sucks.
If you drop back to an eight way A100 box at 4U from a tier one vendor then you can get 10 of them in a single rack no problems. The 66kW of power is sufficient and being in Europe with 230/240V AC I can use C13 to C20 leads without changing the laws of physics as happens in North America. So suitable PDU's are easy to come by. Note the C13/C14 connector is as designed rated for a maximum of 10A. Due to the anaemic electrical system in North America you have decided the laws of physics don't apply and it's magically safe to use with a 15A load! It's no wonder the rate of electrical fires is much higher in North America than in Europe.
@charlesclampitt5067 ปีที่แล้ว ⁺²
The power draw for the power you get from the gaudi2 shows how efficient they are.
@JazekFTW ปีที่แล้ว ⁺²
In any case as a hardware enthusiat this video was amazing and those 4U server... WOW
@ServeTheHomeVideo ปีที่แล้ว
We have been reviewing the 8-10 GPU systems since like 2016-2017. They are always awesome.
@jerrytabeling ปีที่แล้ว ⁺²
Looks like free hardware tier noted in video is no longer available. Least expensive one I see is $0.45/hour for Small VM - Intel® Xeon® 4th Gen Scalable processor
$0.45 / hour
8 cores, 16 GB memory, 20 GB disk
@hotscott6619 ปีที่แล้ว ⁺³⁵
great show when they swap that old stuff out id like to be at the swap meet
@__--JY-Moe--__ ปีที่แล้ว ⁺⁴
🍿
@ServeTheHomeVideo ปีที่แล้ว
Ha! I wish that was an option.
@abosameh-tx7zl ปีที่แล้ว ⁺¹
چ
@ShdRuf ปีที่แล้ว
@@abosameh-tx7zl😂111e😊
@kelownatechkid ปีที่แล้ว ⁺³
Nice, thank you to Intel and Patrick for showing us this
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
This was a super fun video to do!
@scott-982 ปีที่แล้ว ⁺⁵
Just a Sony FX3 chilling on the table like it ain't no thang. 😆 Cool video.
@ServeTheHomeVideo ปีที่แล้ว
Canon R5C, C70 and Sony A1, FX30, and behind me a FX6 as well. This was one of the last videos shot before we fully took down the studio in Austin to move to Scottsdale so cameras were all on that shelf.
@joseaguilera-lizano2461 ปีที่แล้ว ⁺²
Pretty cool, thanks for the video!
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
Glad you enjoyed it. Have a great weekend.
@redtails ปีที่แล้ว ⁺³
kinda sad if you consider the majority of those "ai workloads" might be analyses geared towards selling more junk to customers, i.e. advertisements, customer behavior analysis, stuff like that
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
AI is going to be used for almost everything, so not much we can do at this point.
@krijo210 ปีที่แล้ว
Harley Davidson. 65 hp for making noise. 5 hp for pushing you forward.
Intel. 4950 W for heating the air. 50 W for doing useful work.
Arm. :)
@autarchprinceps ปีที่แล้ว ⁺⁶
Too bad Intel's AI/GPGPU products have come out close to a generation too late to be close to competitive. Nvidia certainly could use some competition to bring down pricing. They pretty much want your first born at this point, since they are basically selling anything they can produce no matter the price. But if you make back the entire cost in energy savings from the better efficiency, it will still be worth it for customers.
@ServeTheHomeVideo ปีที่แล้ว ⁺⁴
Word on the street is that Gaudi 2 is actually very competitive on a price/ performance perspective. Also, we covered this yesterday on the STH main site, but for only a few thousand GPUs (~$50M or less) NVIDIA is now pushing the L40S which is much lower power/ performance. Hopefully one day we can do Ponte Vecchio vs. L40S www.servethehome.com/nvidia-l40s-is-the-nvidia-h100-ai-alternative-with-a-big-benefit-supermicro/
@Quamsi ปีที่แล้ว ⁺⁶
I think you might have broken the registration page, every time I click subscribe for the free tier I get an HTML only page that says "Request forbidden. Have a nice day!"
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
It happens sometimes.
@__--JY-Moe--__ ปีที่แล้ว
🍿
@AlexSchendel ปีที่แล้ว ⁺⁷
I recall during the heat wave in Oregon a couple summers ago, we all got emails to shut off servers we didn't immediately need in Jones Farm because the AC couldn't keep up haha. Thankfully a lot of steps have been taken to prevent those extreme measures from needing to be taken again haha.
@johnmijo ปีที่แล้ว ⁺³
Very nice, nothing like having a day of 108 and then 112 and then finally that 116 temp :p
I was thinking that PDX got renamed to Phoenix somehow....
@ServeTheHomeVideo ปีที่แล้ว ⁺⁶
I had a room there during a 100F+ day and the AC was struggling
@kimcharles7382 ปีที่แล้ว ⁺²
Just wondering what Camera vendor/model are u using? 😊
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
On this trip, in my hand is the Canon R5. I think we also had a Sony FX3 shooting some B-roll. Normal A-cam on the channel is the Canon C70.
@AlexSchendel ปีที่แล้ว ⁺¹
As a firmware engineer at Jones Farm, I hope I'll be able to catch you next time you're at JF :D
@ServeTheHomeVideo ปีที่แล้ว ⁺²
Ha! People have certainly seen me going out of the cafeteria and elsewhere and said hi. Please do come say hi if you see me
@thepastrecedes1635 ปีที่แล้ว ⁺²
Are you gonna be at SC next week? I kept seeing you last year but never had a chance to say hi
@ServeTheHomeVideo ปีที่แล้ว ⁺³
Yes I will be. I think we are going to have a few folks there
@sumseq ปีที่แล้ว
Hope to see you there!@@ServeTheHomeVideo
@Darkk6969 ปีที่แล้ว ⁺²
All of that AI hardware started SkyNet! 🤣 On serious note so friggin cool you were able to visit that data center. Can't tell you how jealous I am!
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
Thanks! It was super fun to see another part of it
@bits2646 ปีที่แล้ว ⁺²
Nice, I'm a potential customer and I look for 10tTOPS AI machine, can't wait to go play with 10mln hardware :DDD
@JasonsLabVideos ปีที่แล้ว ⁺³
Nice work Patrick !! this is awesome !!
@ServeTheHomeVideo ปีที่แล้ว
Thanks! It was super fun to do
@thomasoreilly6140 ปีที่แล้ว ⁺¹
Looks pretty cool
@ServeTheHomeVideo ปีที่แล้ว
Very cool indeed.
@velo1337 ปีที่แล้ว ⁺¹
do they have intel xeon max 9480 in the Dev Cloud?
@ServeTheHomeVideo ปีที่แล้ว
I know there is Xeon Max in the Dev Cloud
@greenvy2000 ปีที่แล้ว ⁺¹
Seems like the space has a great amount of heat capacity with the high ceiling. I wonder why have a high ceiling height versus a ceiling height of 8' or 9'.
@ServeTheHomeVideo ปีที่แล้ว
Not bad in the Oregon climate.
@wojtek1180 ปีที่แล้ว
Hi, i want to make my own home lab for AI, im wondering how hard it is, and where to start, I want to run llms I'm wondering, if there is better option than nvidia GPU, especially, since one gpu wouldn't have enough VRAM as i need, is there a way to go around that? or because with code llama llm's they just won't start if you dont have enough VRAM, is there any workaround this?
@jonathanbuzzard1376 ปีที่แล้ว
Give up is the short answer. If you can't get by with the RAM on a single GPU then it is not possible to do under tens of thousands of dollars at the moment.
@MakayMurray ปีที่แล้ว ⁺¹
How many times are you going to change your thumbnail? I count 5 times at least
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
The video has great watch time metrics, but CTR is not great, so trying different things. We are slightly too small to have access to the automatic TH-cam A/B thumbnail testing feature so we have to do it manually.
@amdenis ปีที่แล้ว ⁺¹
No H100/H200 or even A100/V100 means only a small % of compute intensive AI devs would opt for it. The Intel is NOT more cost effective, unless you can’t get H100’s.
@ServeTheHomeVideo ปีที่แล้ว
NVIDIA was showing in its slides for the latest MLPerf Training v3.1 that Intel Gaudi2 is more cost effective than H100. That is even just accelerator costs not including the fact that NVIDIA was using PCIe switches and Infiniband not Ethernet straight from Gaudi2. A100 is falling behind now since it does not support FP8 Transformer Engine
@hubstrangers3450 ปีที่แล้ว ⁺¹
Thank you....please could demo a K8 cluster setup on that intel cloud platform, if time permits.....thank you again..
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
Cool idea. We actually did a $1 Proxmox VE cluster video at PhoenixNAP last year.
@chengcao418 ปีที่แล้ว
what's that AC plug? It looks like an J1772 car charger lol
@__--JY-Moe--__ ปีที่แล้ว ⁺¹
wow! look's like a fun day @ the park! nice stuff Patrick! yup! ''never been a better time 2 use the Cloud''! it's 2 bad Wendell couldn't see all this!🍿🎉
yup! this is the great thing about ''quick pay plans'' 4 a server! goes beyond convenient, and fast! good luck!
@ServeTheHomeVideo ปีที่แล้ว
I think Wendell has been to the other part that we showed in this video since he was at that event with me.
@tmcarter3 ปีที่แล้ว ⁺⁵
Always great and informative videos... Job well done!
@ServeTheHomeVideo ปีที่แล้ว
Much thanks! Have a great day.
@Heffen89 ปีที่แล้ว ⁺¹
Intesting sight that Intel uses 5p CEE connectors (the red ones) for the power supply of the racks. I would have expected some weird NEMA connector due to the location in the US. Do they really have 400V AC 3 phase power there?
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
I am not 100% sure, but I would not be surprised in the least given the cloud provider reliability lab is in the facility as well.
@eDoc2020 ปีที่แล้ว ⁺¹
I doubt it's 400v, that's not a standard voltage here. It's much more likely to be 480/277 which is a very common industrial voltage here. It could even be 600/347 which is often used to feed large buildings or campuses which normally is stepped down to 120/208 closer to the point of use.
@jonathanbuzzard1376 ปีที่แล้ว ⁺¹
@@eDoc2020 It is highly likely to be 400V three phase. If you tap off from each phase you get 230V single phase. It makes delivering lots of power to a rack much easier and just about everything can handle 230V these days even if you are in North America.
@eDoc2020 ปีที่แล้ว ⁺¹
@@jonathanbuzzard1376 I know _why_ 400v is often used but that doesn't change the fact that 480 is much more common here. Standard server PSUs usually are good for 100-240 volts nominal and there are also versions available for 277 volts (one leg of 480v). But you need to keep in mind this isn't a normal server, these AI servers are using over 10 kilowatts _each._ Rather than using the 230 or 277 volts of one leg they probably use the higher voltage directly. The PSUs might even have true three phase input.
@jonathanbuzzard1376 ปีที่แล้ว
@@eDoc2020 Blah, blah blah. Let me just check oh yes we have some of those 10kW AI servers in the racks at work. So I have *ACTUAL* experience of getting these things powered and cooled. The cooling is *NOT* an issue, the 12 year old water cooled rear doors can handle a rack full of them and still have headroom. Getting electrical power in is a different question and 3 phase PDU are the way it is done. Oh and *ALL* the vendor solutions for these servers have single phase power supplies. So stop trying to teach grandma to suck eggs. At the moment we can only get one of them per rack as we need serious power upgrades first which are been worked on, but hey what do I know about the subject. Note we have racks full of A100 GPU servers but the H100's have caught us off guard on how to actually get the power into the racks and into the servers. It is a complete sh*t show frankly.
@EyesOfByes ปีที่แล้ว ⁺²
Well, I guess the algorithm forces everyone to go all MrBeast-thumbnail. Cant blame you though. That's just how the system works now ;)
@ServeTheHomeVideo ปีที่แล้ว
Yea :-/
@lucandehaan4734 ปีที่แล้ว ⁺¹
WEF ?
@ServeTheHomeVideo ปีที่แล้ว
hmm?
@Seris_ ปีที่แล้ว ⁺²
thats some questionable cable management for sure
@michaelknight2342 ปีที่แล้ว
Is it the tight cable turns?
@Seris_ ปีที่แล้ว ⁺¹
@@michaelknight2342 Its the random cables velcro'd down the front of the cabinets that seem to go in random directions lmao
@ServeTheHomeVideo ปีที่แล้ว ⁺¹
A lot of that is actually racks that are still in process.
@charleshines2142 ปีที่แล้ว
I can imagine a name like Serve The Home University. Imagine all of the puns that would come about hehehe
@blademan7671 ปีที่แล้ว ⁺¹
Never seen big ass noise suppression cans in a data center. Always seen ear plugs, usually in a dispenser next to the door.
@ServeTheHomeVideo ปีที่แล้ว ⁺²
Yea, they had both, but I missed the plugs.
@jonathanbuzzard1376 ปีที่แล้ว ⁺¹
Usually the ear plugs are for visitors, the regular users have ear defenders.
@houtsothy2986 ปีที่แล้ว
❤
@michaelmcconnell7302 ปีที่แล้ว ⁺¹
im not trying to be a jerk, but im gonna sound like a jerk. you could use some exercise my guy.
@ServeTheHomeVideo ปีที่แล้ว ⁺²
Totally. Rough doing 32 flights in 60 days.
@michaelmcconnell7302 ปีที่แล้ว
@@ServeTheHomeVideo busy is good ❤
@AchikFollani ปีที่แล้ว
بسم الله الرحمن الرحيم اللهم صلي على محمد وال محمد وعلما ادم الاسماء كلها ثم عرضها غلى الملاءيكة فقال انبوني باسماء هولاء ان كنتم صدقين قالوا سبحانك لا علم لنا الاما علمتنا انك انت العليم الحكيم .صدق الله العظيم
@ThiagoSalazarMiranda-e2o 11 หลายเดือนก่อน
😅😘😘😘😅😅😅😅
@MaxCarponera ปีที่แล้ว ⁺¹
Intel still exist?
@NeptuneSega ปีที่แล้ว ⁺³
You live under a rock?
@MaxCarponera ปีที่แล้ว
@@NeptuneSega No I live in the present. Their 64 bits technology failed, their CPUs are still 7nm technology when everybody is 5nm and 4mm, their last domestic 14xxx family "refresh" is ridculous, lost all the market at consoles and Macs, their discrete GPUs releasing has been laughable, and their server family has been expensive and obsolete by ages in process of losing the insdustry market.
@snoot6629 ปีที่แล้ว ⁺³
@MaxCarponera oh so you do know about them
Why you ask if they still exist if you know they still do?

ต่อไป

เล่นอัตโนมัติ

Inside the World's Largest AI Supercluster xAI Colossus