When I worked in a data center, a redundant 25kW rack was excessive, reserved for clients who requested a high-density rack as a project requirement. Today 50kW is "normal", 120kW is the new limit, and there are people talking about 200kW ultra density. Working in a data center is increasingly dangerous.
It's getting to the point "grid" voltage has to be supplied to the rack. 120k @ 240V is *500A* I use 4/0 welding cable in the EV to support loads like that. (and it's soft limited to 300A)
The biggest issue at these crazy power draws is that if the cooling water stops flowing for any reason whatsoever you have to have a system in place to throw the breaker for an emergency shutdown or your system is literally toast. There is no time whatsoever for an orderly shutdown.
I remember having an outage in 2015, my rack was using only 32kW, but the outage was caused by the back generator frying the power board, someone literally died that day, it was on the city newspaper, my DC was in a small city but closer to big financial centers, so it had sub-ms latency. I felt justified in my 2h outage of my service because of it, I had to fail-over to the auxiliary DC, which was my staging rack. I can't imagine how insane it is now.
@@jfbeam I wonder why do they still supply 110V to servers, or even 240V. why not 48V DC direct to a step-down converter instead of a PSU ? wouldn't that be more efficient at the cost of having a huge copper bar on the back of the servers going to a gigantic PSU on the bottom directly supplied by 2000V ? may that's why, it would cost more in proper personnel to mess with the servers if there's a copper bar carrying 500A exposed
I've been building data centers for nearly 20 years - it blows my mind to see what the enterprise and hyperscale sites are consuming. I also live in VA, so the vast majoruty of HSDC's are in my back yard. The energy footprint of these are beyond most peoples comprehension.
Data centers are literally sprouting out of the ground all around me in NOVA. I think there are six new facility's going up within walking distance of my house 0.0
@@Rushil69420 VAST majority are commercial leased properties. DOD/IC doesn't tend to colo or lease, they generally like to own their own space for classified+. Building to SCIF standards goes well beyond TIER. But there are definitely exceptions.
I'm sorry for being "that guy" but I am a bit appalled at the rhetoric around hydrogen as fuel. Hydrogen does not grow on trees. It's not a "power source". It is a method of storing and transporting energy. It's only as clean as the technology used to generate and transport it. By itself, it's like a battery.
I've tilted at that windmill for decades. People. Do. Not. Understand. Just like with the "green" EV... where the f... does the power come from? H2 is a _horrible_ energy storage or transport method. It takes an unimaginable amount of energy to hack off the hydrogen, and then even more insane energy levels to compress it to a remotely sensible-to-use density. (do a bit of research w.r.t. the physical volume of tanks in a fuel cell car vs. the actual amount of H2 in there, and the _thousands_ of PSI required to get even that little.)
Pumped hydro gets 70-80% round trip efficiency (power out/power in), utility scale batteries are similar. Hydrogen is...18-46%. Wow, yeah that's bad. Maybe if you're getting solar power from central California at $-21/MWh (negative, not a typo, real price at times) you can make the economics work.
well... so much for all those net-zero targets. Microsoft is using 30% more power than when they said they would be net 0 by 2023 lmao. from what I gather, their explanation is: "... yea.... but AI!"
Today was Samsung's keynote livestream stream The description had "Galaxy AI" even way before the stream started. Every major product announcement had big bold mentions of "Galaxy AI"
A lot of that has to do with how we see the world around us, for example: I was a service installer many years ago and I used to service amazing places with gorgeous views. I'd always comment on how glorious and amazing the views were to the home owner, and each reply understand what meant, but since they were seeing it day after day. It really lost its charm and wow factor. Even though the view really didn't change, but they did. They normalized the amazing. This happens to all of us unfortunately and there's very little that can be done.
All the CEOs happy to blow shareholder funds on shiny shiny because FOMO and they have no clue what 'AI' really means. Might make for some real bargain GPUs when they go bust.
And the power costs alone will be very noticeable. Nvidia estimates that an H100 at a utilization rate of 61% will match the average US household power draw. Are you ready to double your power bill for AI?
@@r0galik Most of them are not PCIE (some are) so it's not like you can just pop them in any system. H100 sxm version has a TDP of 700w which is more power than I use when AC isn't running.
Its called the "Rebound Effect" and its the observed phenomenon that technological improvements in effeciency NEVER result in a net decrease in power consumption. Its similar to the fact that despite being the most productive workers in human history, current workers are working MORE than ever. A classic example from this perspective is a driver who substitutes a vehicle with a fuel-efficient version, only to reap the benefits of its lower operating expenses to commute longer and more frequently and thus resulting in a NET INCREASE in energy consumption. It's a product of Capitalism and the paradigm of perpetual expansion. But also demonstrates how technology cannot solve the environmental and reaource crisis.
well it is testing with tidal power and wind power to add to its thermal power plants to get away from non-renewable power. So the power might not be there as right now the higher power data centers need their own nuclear plant.
Not remotely a new trend. In these parts (NC), data centers have been replacing industrial facilities for decades. (textile mills, furniture factories, etc.)
servers used to be stupidly power efficient but when "AI" hit they all went crazy, no matter the investment the latest shiny thing brings crazy returns so shy not pump a whole power plant worth of energy through your server just to generate few images?
Eh, stupidly power efficient is relative. Yes, they were using far less power per server but often did multiple times less work at the same time. That said, I'm pretty sure that I heard "it's only 2W per core!" a few times already.
@@MoraFermi well, actually the consumer components and to some degree even workstation ones that are stupidly inefficient, cutting the power in half reduces the compute by maybe 20% and don't get me started on graphic cards, RTX 4000 Ada SFF is a monster thanks to nearly linear scaling of GPU compute with "cores" and clocking it around the optimal values
Funny story (now): project to move servers to a DC, told they were supplied by 2 different power stations on opposite sides of the city, 2 UPS systems on site, etc, etc. One Monday morning, servers all offline - power cut from 1 power station caused that UPS to trip, then the other overloaded. And why weren't we told? The control room was fed from the same UPS so also went ☠️ Can you say SPOF??? 🤦🤦
Lookup commercial / industrial power rates. But, yes, modern data centers have huge power bills. There are several "small" DC's around here (RTP/RDU) that have multiple 50MW connections.
Power and cooling: many years ago a customer selling HPC systems told me that a university had expressed their budget as “600 kW - you can split that however you need between power and cooling”
The energy per GFLOP or GIOP has greatly reduced. We just found bigger problems, more complex solutions & greater consolidation. We once had 80 cores using 40 rack units doing 128b per cycle per core, now its 1 or more 512b per cycle per core & 256+ cores in 1RU. Network power has increased but offload has reduced CPU usage (reduces power usage) & offering higher throughput means much better per Gb efficiency. It would be interesting to see how power per Giga FLop/INTop has changed over time vs absolute performance per rack.
I like AI I like playing with it. Maybe it will be super useful one day. But I don't like how it is changing the industry. The sooner it crashes the sooner everyone rehires the devs they fired
There's gonna be an AI bust here by 2025. No one has made a product yet that can't be done by previous tech and cheaper. AI ain't going away, but the early adopters are gonna eat a lot of cost and declare losses on these in the next couple years.
Yeah, that’s not going to happen. There’s a lot more to AI than chat GPT and image generation, and loads of companies are using it to improve productivity, unlock new revenue streams, and reduce costs. AI models are so ubiquitous that a lot of people are benefiting from them without even realizing it. Examples: Auto captioning on TH-cam. Weather forecasting from the National weather service. Fraud detection every time you use your credit card. Order execution on the stock market (nearly 100% done by AI tools these days). And that’s to say nothing of smaller use cases that are largely invisible to ordinary people. Best Buy is using a LLM in their call center to get transcripts of calls and summarizing each one. Previously that was all done by hand, and now the reps can just offload that task to AI and immediately move on to the next call. They’ve seen a 25% reduction in staff. Or alternatively, Google used AI to find a brand new compression algorithm that results in smaller file sizes for videos with no change in quality. They’ve shaved many petabytes off both their storage and network costs. And that’s not even counting more ambitious use cases for AI such as in the pharmaceutical industry leveraging AlphaFold to develop new drugs and treatments that weren’t possible without AI. Or what every automotive company is pursuing with regard to driver assist features (lane assist or automatic breaking) on the path to full self driving. And the military already has a full AI powered F16 ‘pilot’ that is as good or better than their best human pilots, and are developing AI drone wingmen for their next gen fighter platform. If you think AI is a bubble and going to crash and burn never to be seen again, you might want to reconsider. There’s A LOT more to AI than large language models or image generation apps.
The early adopters already sold for a profit. It is the masses or the companies trying to bring AI to the masses, that is taking the bill when the bust happen
I agree with your sentiment but I don't think everything could be done with previous tech, unless your counting penny labor in other countries, regardless cost per benefit doesn't make sense and in lots of cases it's a negative
@@jaykoerner The primary use of AI that I've seen is in a replacement for internet search. At least as far as everyday life that has the most impact on the most people. In my own life it has proven really good at coding. There are some specialized fields such as medical that I could see it making an significant impact. Or in research to spot trends. But AI right now is not trustworthy enough to not be double checked by humans in either of those fields. But again, monetization is the key and there currently isn't enough of that to justify the infrastructure build out we are seeing.
@@ServeTheHomeVideo I hear you, I struggle the understand how to measure MPG and electric cars LOL, and I am a car guy and a tech guy as well, now, how much an average household consumes more when you have a rack or 2 downstairs? sign me up!!
@@ServeTheHomeVideo Another example: the Tesla Semi... 1.2MW pack. One of these servers would kill TWO of those packs per day. (One of those systems uses more power PER HOUR than my entire house does _in 4 days_ )
At Hurricane electric in Fremont where we rent racks, you pay for 15 AMPS peak, but fire marshal says only 12 AMPs steady state is permitted. So yeah, power consumptio is a big problem esp. for the older data centers which only have 120V power. RUnning at 220V is more efficient and gives higher power.
So true. Our AI cluster's heat is beeing used to heat the building where the Datacenter is located. No joke. It works well and the heat is at least used in Winter. And in Summer the heat isnt "destroyed" from a air conditioner, it is transfered into earth where it is cooled naturally.
That is starting to happen more. If you look at the NVL72 that uses a busbar and top and bottom power supplies given the current running through the busbar.
Seriously great look at the data center side of things. Would love to see more like this, but also doing a deeper dive into power and cooling for home labs or small offices would be useful. How am I supposed to power a 2000w+ system when a standard circuit is just 1500w?
I use to build these from the Red Iron on down. Truly amazing how just the power infrastructure works. Never mind the switch to going full fiber networking from copper. The cooling is the most interesting part. Lots of solutions, even ones that no longer use a false floor like my last data center install.
@@ServeTheHomeVideo Not if it's spec'd and built correctly. 'tho it does tend to concentrate loads on the subfloor. A rack sitting flush on concrete has a much larger contact area than 4 tiny feet... or for raise flooring, maybe 6 pads. This is one of the many reasons in-row cooling, and even in-rack (rack door, water, etc.) has become so popular. However, with a 100K+ load per rack, there's no way you'll cool that with air.
We never stopped complaining. Crypto crap is entirely a waste of time and money. At least "AI" produces something of measurable value. (small as it may be to most.)
I find it saddening (or stressful? idk to be honest) that we're expecting future one unit server to draw in the 10s of kW but the science behind energy production didn't follow as much. But of course, we'll get better infrastructures and more compute power for research. If only there was a sustainable way to produce all that electricity, we'd enter a new informatic era. (sustainable : economically, environmentally, socialy and more)
@@ServeTheHomeVideo love your videos, but we just don't need AI generated images, they look weird and uncreative no matter what you do. I would maybe just have photoshopped a 6U GPU server or a Nvidia HGX100, in front of stock photo of a big power plant, and added a caption. It's simple and not distracting, like your usual thumbnails, which are perfectly fine and functional.
So the Vertiv Powerbar is 1000V. Then the boxes you mount let you do things like two lower voltage circuits for PDUs to plug into with breakers onboard.
B1M had a video this morning about using the waste heat generated by datacenters for various purposes. That video was targeted more towards average people, but would love to see more tech-focused content on this.
You migght have been able to fill a rack with just 16kw of power, in HPC land three 32A 240VAC PDUs where what we where using 12 years ago and even then there was not enough power in the rack to have the compute nodes plugged into two different PDUs.
To put in perspective direct liquid cooling is essential if you want over about 80kw in a rack. Put simply you cannot buy air cooled servers that would go in your rack and draw more power. For example an air cooled Nvidia DGX server is 12kw and 8u of space so 6 would consume 48u which is an oversized rack to begin with but its only 72kW power draw. The numbers are similar for compute based servers as well.
In the Datacenter power is not the issue but cooling capacity. A Datacenter is measured by cooling not by power as that is usually the least problem to get from the grid
Cooling used to be an issue. Then people gave up and embraced liquid cooling. Air can only get you so far, then you need to bring the water from the CRAC to the actual equipment.
great video, I never would've thought about servers having ballooning power envelopes. I hope simpler more traditional server needs can be achieved with distributed low-power machines like SBCs. Though I expect it'll just stick to the same hardware but one server does the job of several. Not looking forward to every AI hack company guzzling server and power resources on their dumb ideas.
Outside of ECC memory and very few other features, the top end of servers are just scaled up laptop parts. The CPU may be 500W, but it is mainly the same part as your 5-15W laptop CPU on a larger scale. High power systems are often more efficient than just adding 100x low power systems...
I have a customers who went from a full rack (or racks) to 2 or 3 servers (mostly moving everything to virtualization) and in the end they have room to add more VM's in the future and use less power. So in smaller deployments it can cut costs and be more efficient to consolidate to newer hardware but it's a whole different animal when talking about large enterprise/datacenter/hyperscalers.
I was working on designing my own Socket940 waterblock for the Socket940 AMD Opterons back in April 2006, when I was still in university. People used to think that watercooling was about thermals. Its only peripherially related to thermals. It's actually about power -- and how efficiently you can use the power that you have available to you, to move heat out of the systems/racks/data centers. This isn't new. For me, personally, it's an idea come to life, that's 18 years in the making.
Power consumption per node has definitely increased, but the amount of work done per node has increased even further. Applications can be executed with fewer modern nodes versus more older nodes like earlier times. In other words, to run your application, you only have to rent half a rack when earlier you had to rent the entire rack.
It will be quite interesting how the industry will end up being for normal data centers. I know for some of ours they are 5-10kw max per rack I believe in some spaces it's up to 20kw but not cheap to do. Think it will be the future moving everything to liquid or immersion.
I think you have to be more specific when comparing the AMD vs Intel processors and their power use. What is the standard geekbench values with identical memory and identical power usage ? What are the number of cores / threads ? (These can be different)
We talk about that in a bit more detail in the main site article. Geekbench 6 is just about useless for server CPUs. 5 is slightly better but still not useful. We pose it more when we started at 10-15W per core in 2008-2010 and are now at under 2W/ core with Xeon 6
@7:36 uh no... remember, your power bill is still based on what you actually use, not what your on site power infrastructure can actually support. besides, building out your power infrastructure to support 2x, 3x, 5x is just simply future proofing your facility and is a good investment for not that much additional upfront cost. most data centers today will run out of power before they will run out of available RU.
Totally correct if you are building your own data center and will be owning and operating. If you are billed directly on utility power with no minimum in your contract, then you are great. If you lease colocation racks for often better connectivity, then you will massively overpay doing power like that. I was chatting with a STH main site reader that operates a few data centers. He said he has a several hundred kW client that has been doing this for 7+ years and runs at 15% of their contracted power. When they do a refresh, they add contract power based on the new power supplies. I hear stories like this often.
We'd require a lot less energy if people building applications focused more on performance and using efficient languages. I think most developers would be surprised what a single CPU can do if they changed their techniques slightly. AI and crypto will of course blow all of that out of proportion even further.
@CD-eq8tv When we build applications, we choose tools to build with, like programming languages, libraries, etc. Each of these tools has tradeoffs, like how fast you can build features vs how fast it actually runs. Too often the decision is made in favor of developer speed, so you end up with languages that can only utilize a single CPU core, or that run orders of magnitude slower than another language. The slower things run, the more power they consume, and the more machines we need to run the app on. Other times the issue is in how we build certain features. We're under pressure, have time constraints, etc. So we choose any solution that gets the job done, even if it isnt fastest. Months later, that feature gets run hundreds or thousands of times per minute and the app feels slow or we incur more cost than necessary. TL;DR: Sometimes developers need to be ok with taking a handful of months more to choose efficient tools and solutions to problems.
Backend engineer chiming in, more often than not it’s not up to you to make the decision to spend N amount of extra engineering hours to make something more efficient. In many cases the product owner is the in charge of making that decision for us, based of time and cost it’s almost never the case where it makes sense. Both in terms of time-to-market of a feature or product, and the cost of engineers compared to the meager savings in hardware provisioning. Throwing more compute at a problem is almost always cheaper than chasing that last 10% or whatever of extra performance
@@skaylab agreed for after-the-fact improvements. With the right monitoring and data, it can be relatively quick and easy to make 10-30% improvements for a particular flow in an existing product. At least, that's been my experience.
I remember a year ago hearing someone at a big chipmaker saying "nowadays many do not pay for floor space but for power instead" and all I thought was... it has been like that for at least a decade :-)
I would love some full in-depth review about smaller 1kw-3kw vertiv ups. I normally buy Eaton and only don't get Vertiv because i have no idea about other players. Full soup to nuts reviews about some UPS would be great as Eaton is so expensive these days, that I'm my opinion has not changed in 15 years and when you are paying 4k for a UPS for a 8k server it just seems a little of of balance.
Unless you're completely out of rack space then you are not buying top end servers. You are buying at the sweet spot where cost meets performance. You are also connecting to the network using multiple 10 gig connections, not 25 or 40.
Fun video! Liquid cooling, whether it's on chip, immersion, whatever, is the future of high density compute. Has to be. Recapturing the heat load and utilizing it in the building makes too much sense ...or dollars and cents. NTM, the ability to throw cooling towers on DC's and let millions of gallons of water evaporate to the ether is inefficient and bad resource mgt. These data center owners need to show a respectable PUE and show they're good stewards of a communities resources.
the power requirements are getting rediculous, its about time that the chip company and pc manufactures comply with energy efficiency requirements of less than 500w per system and better by 2030 etc
@@ram64man The problem is that "per system" isn't really a helpful metric. With that, you'd just end up getting lots of "systems" with a single GPU in them. The problem is that people are using them for wasteful things. Which is a political problem, not a technical one.
Consumer systems are practically limited in total powerdraw, and we are around that limit today. When a normal socket/fuse can not suppoly enough power, the number of consumers willing to buy the system will drop significant. When "normal people" can not even use the top end, it is no longer a selling point for the lower end systems either.
@@ssu7653 I'm sorry, but no. It may sound absolutely insane to you (because it absolutely is) but you can get 2800W consumer ATX power supplies. For, I dunno, welding GPU diodes together at the New World main menu or something, I guess. And yes, you would need a fancier than normal outlet to plug in into to get the full rated load. (And I've seen 2800W redundant server supplies too, some of the heaviest feeling Intel Common Redundant Power Supply Standard units I ever held in my own hands) Do actual consumers use them? No, they are a very very small quantity of the installed base of consumer machines (which are typically under 300W draw), so I concede there. But as far as consumer-branded-availability? Yes, a number of crazies are trying to push them as useful to gamers (which they are not). I know what you meant, but what you said had a different semantic meaning than the intent I conceded the point to.
On the power side - there was some announcement, that Microsoft is getting its dedicated power plant for a bigger and power hungry DC. Is that the 100MW future project you talk about? :)
The GPUs are the primary available options, but as the ARM CPUs with built in NPUs are expanding the power draw should drop once they start scaling it out, or at least increase performance at the same power draw. Also the increase in M.2 based NPUs that do not need a dedicated graphics card can help scale things out as well. Even then, most of these NPUs are still only sitting in the 3-50 TOPS range, versus the mid and high range RTX cards can handle 100-1000 TOPS. With a similar ROCm enabled workload, the mid and high end RX7000 series are still within the 150-800 TOPS range.
Does anyone have graphs regarding the rise in compute power requirements as the ROM compute is scaled, eg Aschenbrenner's recent paper called "Situational Awareness" on the state and future trends of AI / LLMs
When thinking about the power consumption of these ultra-powerful servers, you should probably add the cost of cooling these beasts to the overall power consumption calculation.
Any semi-modern facility I've used allocates around 3KW on the low side and despite not running any particularly beefy servers we really feel the pinch. Worth noting I'm in telecom so we're running mostly DC equipment. It always feels sort of strange to me seeing enterprise stuff use a plain UPS since you lose a ton of your runtime from the inefficiency of converting the DC battery to AC output. I do more or less understand why DC is less common in the enterprise but it's an interesting deal
@@motmontheinternetSo you don’t actually have a UPS on DC power plants. You have a rectifier which is attached to battery banks which is different in some fairly fundamental ways. One of the bigger advantages is that it’s fairly easy to add more battery banks if you have the space - if you ever get a chance to look inside the cabinets at the base of a cell tower you’ll likely see at least one crammed to the brim with batteries for this purpose. The reason it isn’t used in more enterprise spaces is that the copper cabling is fairly expensive and terminations etc require specialized skills
Here's a better example to illustrate how much power these systems consume: the average US household consumes on average about 1.3kwh. So one of those NVIDIA racks is equivalent to about 100 US household's power.
Sooo, no mention of the data center's massive fresh water usage? And the ramifications of it, while the whole country is dealing with water scarcities?
Just to give you some idea, I know most of the new AI data centers here in Arizona are being designed for net water neutral operation. Also the fuel cells make water
Please do note how politically charged the word "fresh" is in fresh water usage. What do you expect to run those servers on? Murky shit water "fresh" from the sewage systems? The radiator in your car also uses "fresh water" (distilled) and maybe with a tiny bit of chemicals to reduce rust, but in essence, it is fresh water. The radiators inside your house that run off of central heating (if applicable), your toilet and also the shower use "fresh" water to do their thing. I know that there were some cases where actual fresh water from e.g. rivers was being sucked up and disposed of by some DC's and that this is ridiculous, but to only focus on the water aspect in a world full of dwindling resources is a bit narrow minded IMHO.
@@masterTigress96 your grievance is selective at best, that water could have been used by the locals regardless. To your point, had they taken murky sewage water, purified it, then used it. We wouldn't be having this conversation. But to say people have such a small attention span that we can't focus on more than one thing, and that this should not be what we focus on is rather dismissive. Because you can literally say that about any topic, and then nothing gets done.
@@juanmacias5922 That is true but they can filter out *any* water before "serving" it to the people. They can filter water from the ocean before you flush it down the toilet, and they can clean it up again before disposing of it, maybe even recycling it. The problem is cost and of course the ancient old turn-pee-into-water-again machine has the same failure rate as the ice cream machines at McDonalds. It comes down to cost, and we only serve so much clean, usable water, to so many people, for a given price. The question is then, if you think that the locals should take priority, how are you going to convince the DC's to agree to paying more for water that needs to be filtered or brought in some other way? The investors/stake holders of the DC's will lobby and say they are more important then the locals, and the locals will do the opposite. If we had an infinite amount of clean water and no worries of it running out or having droughts or water shortages, we wouldn't be having this conversation. So the question is, who is going to be the one that will "feast" on the shitsandwich of paying for more expensive water? The same can be said about electricity, locals could benefit from cheap electricity from a local power source (solar, wind, nuclear, etc). but that is now also being redirected to the DC's. The list goes on an on, there is a major lack of any sort of real jobs being created by DC's nor are they kept in the area after the DC is build. My point was that just the water aspect of it is just one tiny portion of a whole list of bigger issues that (could be) attributed to datacenters.
This just depresses me. We should be working towards using less power, computers are already fast enough for most day to day things people wanna do - send messages, photos, videos and look at photos, videos, documents. Instead of focusing on making all those things easier to use and more power efficient we choose to make the world unlivable and for what? A glorified search engine that can make uncanny valley style images?
It's madness. This power guzzling approach to servers is not sustainable. It's only a matter of time before politicians and and legislators begin demanding vendor stop this nonsense of designing such wasteful hardware.
They have a serious problem legislatin away from this. Either Intel/AMD/Nvdia are at the leading edge, or we hand that over to China/Russia. It is that simple, stay on top or give "the enemy" the upper hand as far as technology goes. This will VERY fast give equal more advanced military capability and loosing THAT edge would likely be the end of USA (as we know it)
@@ssu7653Russia? They don’t have anything close to Intel/AMD/Nvidia, and probably never will how things are going. And while China is trying to compete, their strategy is mainly to copy and reverse engineer, not being a technology leader (also they rely on chip fabs in Taiwan).
I think you're missing the point somewhat. A large part of the reason the power demand per system has gotten so high is because they've been able to cram WAY more capability into each system. If each new 1kW server replaces 4 500W servers then your efficiency has gotten a lot better. But off course, you can reasonably argue that the software running on these servers has gotten excessively compute hungry compared to the benefit it provides.
It is clear that data centers are both greedy in electrical power and large producers of heat, so much so that near my home a university data center has not obtained the right to be built if and only if it involves the recovery of its heat produced to become the central heating of the neighborhood's residential buildings. another solution for data centers that is a bit extreme and little seen on YT pushes the concept of liquid cooling to the grand master level! no more tubing cluttering the box to distribute the liquid to just the right places no, the grand master level is the submerged data center!! the motherboard and all its accessories (duly adapted to such a situation) find themselves drowned in a carefully chosen coolant (like that of your car - not a wise choice - but you get the idea)
Perhaps several Motherboards will end up immersed in a tub of mineral oil. There goes the case, case fans, CPU cooler, GPU cooler, Network cards and several heat-sinks. Water boils at a temperature too low for high end parts. Being able to pump oil via radiators, or heat pump tech may be an answer.
NIC's shouldn't need their own OS. But in all fairness, BOTH of the cards you held up run their own OS. (the connect-x card has onboard firmware. possibly vxworks based, I've never disassembled the firmware.) Much older cards also have "microcode", but most aren't upgradable. (I recall updating the microcode on some broadcom embedded nics on supermicro boards - they behaved badly without the IPMI card.)
I wish he added the info that currently, the most important thing for a new Datacenter is the proximity to a nuclear reactor since they are a source for consistently high amounts of energy. Especially since most of the world bought into the hype of solar and wind which are the opposite of what a datacenter needs. I do appreciate that he highlighted the fact that they do use solar and stuff to cover their energy costs tho
I frankly don’t understand why the choice was made that it was okay to break the power envelope to get more performance. Part of this is probably intel’s stagnation pre-AMD that broke the dam because it was the only way to increase performance for several gens, but how were data centers okay with just eating that cost for relatively less performance per watt?
Even if it is less performance per watt, i am 99% sure the performance per cost is WAY up. Take the total system cost including software, cooling, running cost and space over the expected lifetime then compare it to how much work is done. Performance is WAY cheaper today than it was 10 years ago.
@@prashanthb6521 I know it has increased, but the incremental gains have slowed significantly it seems. I guess my question is, when data centers realized they would have to start massively scaling up their power supply and cooling solutions, at high cost, for let's say the same % gain in performance YoY, why didn't they balk?
@@TheShortStory I think this video is making people look at this issue in a wrong way. There is no serious problem here. Assume a DC had 1000 old servers running. Now it requires only 600 new servers to run the same software load. Also these 600 new servers consume the same power as those old 1000 ones. Now every rack in that DC is 40% empty, so they decide to fill it up with more new servers, raising the total power consumption of that DC. If they dont want to spend on new power supply and cooling then the DC operators should run only 600 servers. They would still be earning the same revenue. On top of that they would be spending less on maintenance as they are operating less machines.
@@prashanthb6521 I’m sure the business case makes sense at the end of the day, because otherwise this transition to higher-power devices wouldn’t have happened. But someone is paying that price, whether it’s VC or consumers or data center operators through their margins. Yes efficiency increases but if a data center requires a major modification that is a cost for both the environment and _somebody_. I wonder if there really was demand for all that added performance before the AI craze began
The real question I have is, how many servers do I need to run to heat my house in the winter. Home space for cloud computing. I can charge Great rates in the winter.
Those panels look tired af. The fuel cell is awesome, don’t get me wrong, but is it green? It may have its place as energy storage. Because that’s what it is, but I strongly doubt that hydrogen is “green.” Hydrogen might legitimately have benefits over other storage solutions in this application. It looked like it was all truck portable. Temporary/ portable backup generation is a niche I can see hydrogen filling. Thing is, just like batteries, hydrogen on its own is only slightly greener than the alternative. It’s the energy that fills it (tank or battery) that makes it green. Let’s forget “AI.” My guess is it will end up raising power budgets substantially permanently but not nearly to the degree we are seeing in the bubble. If we could cut out just half the tracking & ad related power overhead, how much could that theoretically save? In other words, how much overall compute is spent on surveillance capitalism? Reckon we could (theoretically) legislate away a quarter to half of ad tracking BS? How much could that lower the power bill?
No to much changes when you talk about number of cpus and dimms per server. About 15 years ago, the servers were the same using 450w power supply. I used about 6KW power per rack. The processing power increased and so the power consumption.
@@ServeTheHomeVideo Yep, that help to limit thermal runaway propagation. Gaseous fire suppression is only partially effective against lithium ion batteries. The FAA did some testing, the results are online. I'd feel more comfortable with a thick concrete wall between those batteries and anything important. I could store gasoline in a DC and manage the risks but at the end of the day it's better to have the gasoline somewhere else.
to bad they can't use the heat generated and convert it back into power. also, it would be better if they just used dc power, especially when using batteries and solar. just fyi, elon(tesla) mentioned their data center will be using 500mw
The Matrix was prescient. Use "Human' as battery! That _is_ the one thing the world will never run out because if it actually happens, power will be the last of our worry.
Soon we'll see mini nuclear reactors being built alongside of new data centers. 🤣 That's just crazy. They need to build solar / wind farms to keep their carbon footprint down.
i have a dell r630 and i use the 500w power supplies because i have only 2 sata ssd for os only and 8 nvme as storage for the LLM and Image models and it running 12 dim of 4rx32gb 2300mhz ram and two xeon e5 2699v4. i am idling on about 130w and max out at 450w. also it is only one power supply used at the time. i know this is risky but it is done for power evfessency also i dont care if the server goes down and restart with the second psu if psu 1 fails. everything is automated and volatile nothing is critical on this system. it is used for LLM and image generation it has no other purpose nothing is saved on this system. every image that is created and i want to saved is on my nas .
@@ServeTheHomeVideo theoretically, the server can use more power, but in real-world conditions more than 450w was not possible for me at 100% load with a LLM or ai image creation. i dont know how much more the psu can handle or if the dell power management system switches the second psu also on. in the ipmi is only 2 option for the power management. one does distribute the power evenly, the other sets a power supply priority. but what happen in the background is not clear communicated in the ipmi and the lastes user manual that is public available is so old it even does not mention the use of the v4 cpus or 128gb ddr4 dims. i have the idrac 8 and a enterprise license luckily so some things are possible but sins i dont work in the server space or have anything to with IT systems professionally. i write programs and drivers as my hobby so i have a idea how this works but no prove to it. if it uses the second psu if power goes over the 495W that one psu can provude. nice no down time but if not is this also ok and i change the setting and use both.
When I worked in a data center, a redundant 25kW rack was excessive, reserved for clients who requested a high-density rack as a project requirement.
Today 50kW is "normal", 120kW is the new limit, and there are people talking about 200kW ultra density.
Working in a data center is increasingly dangerous.
It's getting to the point "grid" voltage has to be supplied to the rack. 120k @ 240V is *500A* I use 4/0 welding cable in the EV to support loads like that. (and it's soft limited to 300A)
The biggest issue at these crazy power draws is that if the cooling water stops flowing for any reason whatsoever you have to have a system in place to throw the breaker for an emergency shutdown or your system is literally toast. There is no time whatsoever for an orderly shutdown.
I remember having an outage in 2015, my rack was using only 32kW, but the outage was caused by the back generator frying the power board, someone literally died that day, it was on the city newspaper, my DC was in a small city but closer to big financial centers, so it had sub-ms latency.
I felt justified in my 2h outage of my service because of it, I had to fail-over to the auxiliary DC, which was my staging rack.
I can't imagine how insane it is now.
@@jfbeam I wonder why do they still supply 110V to servers, or even 240V. why not 48V DC direct to a step-down converter instead of a PSU ? wouldn't that be more efficient at the cost of having a huge copper bar on the back of the servers going to a gigantic PSU on the bottom directly supplied by 2000V ?
may that's why, it would cost more in proper personnel to mess with the servers if there's a copper bar carrying 500A exposed
how long till a DC needs its own nuclear plant to operate? we're lucky if it's a nuclear plant and not coal-powered
I've been building data centers for nearly 20 years - it blows my mind to see what the enterprise and hyperscale sites are consuming. I also live in VA, so the vast majoruty of HSDC's are in my back yard. The energy footprint of these are beyond most peoples comprehension.
The first time I drove through VA I was shocked
Data centers are literally sprouting out of the ground all around me in NOVA. I think there are six new facility's going up within walking distance of my house 0.0
Imagine the climate change! But it's ok...it's (D)ifferent when it's the elites.
@@mr_jarble Northern Virginia, eh? Im sure the government isn’t up to anything at all in that neck of the woods, right?
@@Rushil69420 VAST majority are commercial leased properties. DOD/IC doesn't tend to colo or lease, they generally like to own their own space for classified+. Building to SCIF standards goes well beyond TIER. But there are definitely exceptions.
I'm sorry for being "that guy" but I am a bit appalled at the rhetoric around hydrogen as fuel. Hydrogen does not grow on trees. It's not a "power source". It is a method of storing and transporting energy. It's only as clean as the technology used to generate and transport it. By itself, it's like a battery.
Totally fair. Just wanted to show the solution. Many have never seen anything like that
I've tilted at that windmill for decades. People. Do. Not. Understand. Just like with the "green" EV... where the f... does the power come from? H2 is a _horrible_ energy storage or transport method. It takes an unimaginable amount of energy to hack off the hydrogen, and then even more insane energy levels to compress it to a remotely sensible-to-use density. (do a bit of research w.r.t. the physical volume of tanks in a fuel cell car vs. the actual amount of H2 in there, and the _thousands_ of PSI required to get even that little.)
Pumped hydro gets 70-80% round trip efficiency (power out/power in), utility scale batteries are similar. Hydrogen is...18-46%. Wow, yeah that's bad. Maybe if you're getting solar power from central California at $-21/MWh (negative, not a typo, real price at times) you can make the economics work.
well... so much for all those net-zero targets. Microsoft is using 30% more power than when they said they would be net 0 by 2023 lmao. from what I gather, their explanation is: "... yea.... but AI!"
It's incredible how fast AI images went from being impressive to just looking cheap and scammy by association
AI generated art is being feed into machine learning.
Sorry to rip this bandaid off but stable diffusion images are already indistinguishable from real photography
Today was Samsung's keynote livestream stream
The description had "Galaxy AI" even way before the stream started.
Every major product announcement had big bold mentions of "Galaxy AI"
A lot of that has to do with how we see the world around us, for example: I was a service installer many years ago and I used to service amazing places with gorgeous views. I'd always comment on how glorious and amazing the views were to the home owner, and each reply understand what meant, but since they were seeing it day after day. It really lost its charm and wow factor. Even though the view really didn't change, but they did. They normalized the amazing. This happens to all of us unfortunately and there's very little that can be done.
@@ktfjulien lol you might need glasses
All the CEOs happy to blow shareholder funds on shiny shiny because FOMO and they have no clue what 'AI' really means.
Might make for some real bargain GPUs when they go bust.
I don’t think you want to or are even able to use a DC / „AI“ GPU outside of their servers though :/
@@Felix-ve9hs why? GPUs hardly degrade.
You're going to need to do a lot more in terms of power and cooling at home.
And the power costs alone will be very noticeable. Nvidia estimates that an H100 at a utilization rate of 61% will match the average US household power draw. Are you ready to double your power bill for AI?
@@r0galik Most of them are not PCIE (some are) so it's not like you can just pop them in any system. H100 sxm version has a TDP of 700w which is more power than I use when AC isn't running.
Its called the "Rebound Effect" and its the observed phenomenon that technological improvements in effeciency NEVER result in a net decrease in power consumption. Its similar to the fact that despite being the most productive workers in human history, current workers are working MORE than ever.
A classic example from this perspective is a driver who substitutes a vehicle with a fuel-efficient version, only to reap the benefits of its lower operating expenses to commute longer and more frequently and thus resulting in a NET INCREASE in energy consumption.
It's a product of Capitalism and the paradigm of perpetual expansion. But also demonstrates how technology cannot solve the environmental and reaource crisis.
Iceland might as well just turn itself into a giant country-sized data centre at this point.
well it is testing with tidal power and wind power to add to its thermal power plants to get away from non-renewable power. So the power might not be there as right now the higher power data centers need their own nuclear plant.
There's a reason why so many data centers are being proposed next to power plants with exclusive power deals.
Yes
Not remotely a new trend. In these parts (NC), data centers have been replacing industrial facilities for decades. (textile mills, furniture factories, etc.)
Damn, my whole house only consumes a couple millicybertrucks.
Emmett Brown : "Marty, I'm sorry, but the only power source capable of generating 1.21 gigawatts of electricity is a bolt of lightning."
Was entirely true in the 1950s. Now we have nvidia AI chips capable of doing that in 24 hours.
Jigga who? Gigawatt
Great Scott!
@@ericnewton5720 24h ? they do it in 1h
servers used to be stupidly power efficient
but when "AI" hit they all went crazy, no matter the investment the latest shiny thing brings crazy returns so shy not pump a whole power plant worth of energy through your server just to generate few images?
Hey now, those AI clusters need to train for months looking at millions of images in order to spit out some nightmare fuel photo!!!!
🤣🤣
Eh, stupidly power efficient is relative. Yes, they were using far less power per server but often did multiple times less work at the same time.
That said, I'm pretty sure that I heard "it's only 2W per core!" a few times already.
@@MoraFermi well, actually the consumer components and to some degree even workstation ones that are stupidly inefficient, cutting the power in half reduces the compute by maybe 20%
and don't get me started on graphic cards, RTX 4000 Ada SFF is a monster thanks to nearly linear scaling of GPU compute with "cores" and clocking it around the optimal values
It happened even before AI if you see the charts in the first part
Please. With that power, Ai can generate more suggestive images in one minute than porn has in 40 years… progress!
Funny story (now): project to move servers to a DC, told they were supplied by 2 different power stations on opposite sides of the city, 2 UPS systems on site, etc, etc.
One Monday morning, servers all offline - power cut from 1 power station caused that UPS to trip, then the other overloaded. And why weren't we told? The control room was fed from the same UPS so also went ☠️
Can you say SPOF??? 🤦🤦
That's a real SNAFU.
Yikes!
It’s a 0.5N-1 data center!
An untested backup can’t be trusted.
Now that's some FUBAR.
Nice video. More POWER to you, ServeTheHome.
Fun fact: 1 MW power will roughly power 1000 homes in most parts of he world.
Ha!
I loved this video, so cool to see the various data center solutions. Thanks !
Thanks Jim
At 20 cents per kWh that's $210k per year. Just to let the thing. That's mental.
Yea. Huge costs
Lookup commercial / industrial power rates. But, yes, modern data centers have huge power bills. There are several "small" DC's around here (RTP/RDU) that have multiple 50MW connections.
380V DC (Direct Current) power architecture for Datacentres is another interesting venture. 😎😎😎
Power and cooling: many years ago a customer selling HPC systems told me that a university had expressed their budget as “600 kW - you can split that however you need between power and cooling”
That's a surprisingly practical budget constraint. I take it professors/department personnel were running the project, not university administrators?
The energy per GFLOP or GIOP has greatly reduced. We just found bigger problems, more complex solutions & greater consolidation. We once had 80 cores using 40 rack units doing 128b per cycle per core, now its 1 or more 512b per cycle per core & 256+ cores in 1RU.
Network power has increased but offload has reduced CPU usage (reduces power usage) & offering higher throughput means much better per Gb efficiency.
It would be interesting to see how power per Giga FLop/INTop has changed over time vs absolute performance per rack.
I really don't think those large AI clusters will ever go away from liquid cooling anytime soon. Love the behind the scenes!
AI is the new crypto, except even more insidious - the bursting of the bubble can't come soon enough
The issue is it might change the world?!
Then again didn't we say the same thing about crypto...
🤔
@@TheWebstaff Crypto did change the world, though just not the way it was promised
And as always when a new goldrush comes up, the ones getting rich are the ones selling shovels and pickaxes.
I like AI I like playing with it. Maybe it will be super useful one day. But I don't like how it is changing the industry. The sooner it crashes the sooner everyone rehires the devs they fired
@@Felix-ve9hs 😂😂 very true
There's gonna be an AI bust here by 2025. No one has made a product yet that can't be done by previous tech and cheaper. AI ain't going away, but the early adopters are gonna eat a lot of cost and declare losses on these in the next couple years.
There's a new Gartner report which might be the start of the end.
Yeah, that’s not going to happen. There’s a lot more to AI than chat GPT and image generation, and loads of companies are using it to improve productivity, unlock new revenue streams, and reduce costs. AI models are so ubiquitous that a lot of people are benefiting from them without even realizing it. Examples:
Auto captioning on TH-cam.
Weather forecasting from the National weather service.
Fraud detection every time you use your credit card.
Order execution on the stock market (nearly 100% done by AI tools these days).
And that’s to say nothing of smaller use cases that are largely invisible to ordinary people. Best Buy is using a LLM in their call center to get transcripts of calls and summarizing each one. Previously that was all done by hand, and now the reps can just offload that task to AI and immediately move on to the next call. They’ve seen a 25% reduction in staff. Or alternatively, Google used AI to find a brand new compression algorithm that results in smaller file sizes for videos with no change in quality. They’ve shaved many petabytes off both their storage and network costs.
And that’s not even counting more ambitious use cases for AI such as in the pharmaceutical industry leveraging AlphaFold to develop new drugs and treatments that weren’t possible without AI. Or what every automotive company is pursuing with regard to driver assist features (lane assist or automatic breaking) on the path to full self driving. And the military already has a full AI powered F16 ‘pilot’ that is as good or better than their best human pilots, and are developing AI drone wingmen for their next gen fighter platform.
If you think AI is a bubble and going to crash and burn never to be seen again, you might want to reconsider. There’s A LOT more to AI than large language models or image generation apps.
The early adopters already sold for a profit. It is the masses or the companies trying to bring AI to the masses, that is taking the bill when the bust happen
I agree with your sentiment but I don't think everything could be done with previous tech, unless your counting penny labor in other countries, regardless cost per benefit doesn't make sense and in lots of cases it's a negative
@@jaykoerner The primary use of AI that I've seen is in a replacement for internet search. At least as far as everyday life that has the most impact on the most people.
In my own life it has proven really good at coding.
There are some specialized fields such as medical that I could see it making an significant impact.
Or in research to spot trends.
But AI right now is not trustworthy enough to not be double checked by humans in either of those fields.
But again, monetization is the key and there currently isn't enough of that to justify the infrastructure build out we are seeing.
Rather than cybertruck, compare it to a house, more relatable ;)
A house does not move. My thought was people can get some idea of how much energy it takes to take around 7000lbs 330 miles
@@ServeTheHomeVideo I hear you, I struggle the understand how to measure MPG and electric cars LOL, and I am a car guy and a tech guy as well, now, how much an average household consumes more when you have a rack or 2 downstairs? sign me up!!
@@ServeTheHomeVideo Another example: the Tesla Semi... 1.2MW pack. One of these servers would kill TWO of those packs per day.
(One of those systems uses more power PER HOUR than my entire house does _in 4 days_ )
At Hurricane electric in Fremont where we rent racks, you pay for 15 AMPS peak, but fire marshal says only 12 AMPs steady state is permitted. So yeah, power consumptio is a big problem esp. for the older data centers which only have 120V power. RUnning at 220V is more efficient and gives higher power.
I think they can put in higher power circuits even at HE
So true. Our AI cluster's heat is beeing used to heat the building where the Datacenter is located. No joke. It works well and the heat is at least used in Winter. And in Summer the heat isnt "destroyed" from a air conditioner, it is transfered into earth where it is cooled naturally.
Heat is never “destroyed”…. it’s simply moved from one place to another.
@@ericnewton5720 Call it what you want, we dont need an AC to control the room.
That's how I heat my house in the winter. If I'm paying for the energy, might as well get everything I can out of it.
I still don’t understand why each server has its own power supply and they don’t centeralize them into on large power supply
That is starting to happen more. If you look at the NVL72 that uses a busbar and top and bottom power supplies given the current running through the busbar.
@@ServeTheHomeVideo cool thanks
Seriously great look at the data center side of things. Would love to see more like this, but also doing a deeper dive into power and cooling for home labs or small offices would be useful. How am I supposed to power a 2000w+ system when a standard circuit is just 1500w?
I ended up pulling two 30A 208v lines for my home lab. One is for spot cooling and the other keeps the servers happy pulling back 6kw :)
I use to build these from the Red Iron on down. Truly amazing how just the power infrastructure works. Never mind the switch to going full fiber networking from copper.
The cooling is the most interesting part. Lots of solutions, even ones that no longer use a false floor like my last data center install.
Yea raised floor is good in some ways but also usually has lower load capacity. That is a big deal for the immersion guys
@@ServeTheHomeVideo Not if it's spec'd and built correctly. 'tho it does tend to concentrate loads on the subfloor. A rack sitting flush on concrete has a much larger contact area than 4 tiny feet... or for raise flooring, maybe 6 pads. This is one of the many reasons in-row cooling, and even in-rack (rack door, water, etc.) has become so popular. However, with a 100K+ load per rack, there's no way you'll cool that with air.
Dejavu of large mainframe computer infrastructure of the late 1990s through today but never disclosed.
remember when people were railing out about cryptocurrency mining power usage, pepperidge farm remembers
We never stopped complaining. Crypto crap is entirely a waste of time and money. At least "AI" produces something of measurable value. (small as it may be to most.)
Had to check my playback speed was still 1.0x
Sorry. The baby is 6 weeks old so I am pretty tired still. Hopefully when he sleeps longer I will get back to normal speed.
I am apparently over caffeinated as 1.25 sounded normal to me lol
I find it saddening (or stressful? idk to be honest) that we're expecting future one unit server to draw in the 10s of kW but the science behind energy production didn't follow as much. But of course, we'll get better infrastructures and more compute power for research. If only there was a sustainable way to produce all that electricity, we'd enter a new informatic era. (sustainable : economically, environmentally, socialy and more)
Historically, if there is value, then people engineer a way to make things work.
The AI bubble can't pop soon enough, for the sake of the environment as well, but also so we don't get such bad thumbnail images.
How would you thumbnail this? I spent hours and asked a bunch of folks
@@ServeTheHomeVideo love your videos, but we just don't need AI generated images, they look weird and uncreative no matter what you do.
I would maybe just have photoshopped a 6U GPU server or a Nvidia HGX100, in front of stock photo of a big power plant, and added a caption. It's simple and not distracting, like your usual thumbnails, which are perfectly fine and functional.
Are these rail-mounted powerbox-thingys just distribution for 12V DV or ~200V AC?
So the Vertiv Powerbar is 1000V. Then the boxes you mount let you do things like two lower voltage circuits for PDUs to plug into with breakers onboard.
The ai generated thumbnails really turn me off these vids ngl
If you have ideas happy to change. I was with Wendell this morning telling him I had no idea what to do
@@ServeTheHomeVideo Have you considered paying a human being for their labor?
@@justplaingarakyeah, it might sound like a hassle but that would feel the most genuine 🙂
@@jc_dogen I can imagine, just read a comment about it 😐
@@ServeTheHomeVideo Something photobashed from photos or even drawn in paint would have more value in my eyes, Aaron said it well also.
engineers are now less focused on efficiency after they figured out they can just throw power at the issue
B1M had a video this morning about using the waste heat generated by datacenters for various purposes. That video was targeted more towards average people, but would love to see more tech-focused content on this.
You migght have been able to fill a rack with just 16kw of power, in HPC land three 32A 240VAC PDUs where what we where using 12 years ago and even then there was not enough power in the rack to have the compute nodes plugged into two different PDUs.
To put in perspective direct liquid cooling is essential if you want over about 80kw in a rack. Put simply you cannot buy air cooled servers that would go in your rack and draw more power. For example an air cooled Nvidia DGX server is 12kw and 8u of space so 6 would consume 48u which is an oversized rack to begin with but its only 72kW power draw. The numbers are similar for compute based servers as well.
In the Datacenter power is not the issue but cooling capacity.
A Datacenter is measured by cooling not by power as that is usually the least problem to get from the grid
Not anymore. If you were to talk to people building big new DCs or large DC REITS power is the #1 concern these days
Cooling used to be an issue. Then people gave up and embraced liquid cooling. Air can only get you so far, then you need to bring the water from the CRAC to the actual equipment.
great video, I never would've thought about servers having ballooning power envelopes.
I hope simpler more traditional server needs can be achieved with distributed low-power machines like SBCs.
Though I expect it'll just stick to the same hardware but one server does the job of several.
Not looking forward to every AI hack company guzzling server and power resources on their dumb ideas.
Outside of ECC memory and very few other features, the top end of servers are just scaled up laptop parts. The CPU may be 500W, but it is mainly the same part as your 5-15W laptop CPU on a larger scale. High power systems are often more efficient than just adding 100x low power systems...
I have a customers who went from a full rack (or racks) to 2 or 3 servers (mostly moving everything to virtualization) and in the end they have room to add more VM's in the future and use less power. So in smaller deployments it can cut costs and be more efficient to consolidate to newer hardware but it's a whole different animal when talking about large enterprise/datacenter/hyperscalers.
I was working on designing my own Socket940 waterblock for the Socket940 AMD Opterons back in April 2006, when I was still in university.
People used to think that watercooling was about thermals.
Its only peripherially related to thermals.
It's actually about power -- and how efficiently you can use the power that you have available to you, to move heat out of the systems/racks/data centers.
This isn't new. For me, personally, it's an idea come to life, that's 18 years in the making.
Power consumption per node has definitely increased, but the amount of work done per node has increased even further. Applications can be executed with fewer modern nodes versus more older nodes like earlier times. In other words, to run your application, you only have to rent half a rack when earlier you had to rent the entire rack.
It will be quite interesting how the industry will end up being for normal data centers. I know for some of ours they are 5-10kw max per rack I believe in some spaces it's up to 20kw but not cheap to do.
Think it will be the future moving everything to liquid or immersion.
What's being done to recuperate the insane amount of energy lost as heat? I know in some places it's being used to heat neighborhoods.
Depends on the data center. You are totally right that this is an area with untapped potential.
Patric are you going to cover the Oxide Cloud Computer?
While watching the video I kept worrying that your swinging hands would hit the vertical server making it fall.
Three Hundred Thousand Blackwells?
Huh. Yeah. Nothing to see there. I’m sure.
Stay tuned to STH in late August/ early September for something very big.
As long as my calls keep printing!
I think you have to be more specific when comparing the AMD vs Intel processors and their power use.
What is the standard geekbench values with identical memory and identical power usage ?
What are the number of cores / threads ? (These can be different)
We talk about that in a bit more detail in the main site article. Geekbench 6 is just about useless for server CPUs. 5 is slightly better but still not useful. We pose it more when we started at 10-15W per core in 2008-2010 and are now at under 2W/ core with Xeon 6
@7:36 uh no... remember, your power bill is still based on what you actually use, not what your on site power infrastructure can actually support. besides, building out your power infrastructure to support 2x, 3x, 5x is just simply future proofing your facility and is a good investment for not that much additional upfront cost. most data centers today will run out of power before they will run out of available RU.
Totally correct if you are building your own data center and will be owning and operating. If you are billed directly on utility power with no minimum in your contract, then you are great. If you lease colocation racks for often better connectivity, then you will massively overpay doing power like that. I was chatting with a STH main site reader that operates a few data centers. He said he has a several hundred kW client that has been doing this for 7+ years and runs at 15% of their contracted power. When they do a refresh, they add contract power based on the new power supplies. I hear stories like this often.
Are those power bars distributing 220V, or higher voltage?
We'd require a lot less energy if people building applications focused more on performance and using efficient languages. I think most developers would be surprised what a single CPU can do if they changed their techniques slightly. AI and crypto will of course blow all of that out of proportion even further.
What are you referring to? I’m genuinely curious if you don’t mind explaining
@CD-eq8tv When we build applications, we choose tools to build with, like programming languages, libraries, etc. Each of these tools has tradeoffs, like how fast you can build features vs how fast it actually runs. Too often the decision is made in favor of developer speed, so you end up with languages that can only utilize a single CPU core, or that run orders of magnitude slower than another language. The slower things run, the more power they consume, and the more machines we need to run the app on.
Other times the issue is in how we build certain features. We're under pressure, have time constraints, etc. So we choose any solution that gets the job done, even if it isnt fastest. Months later, that feature gets run hundreds or thousands of times per minute and the app feels slow or we incur more cost than necessary.
TL;DR: Sometimes developers need to be ok with taking a handful of months more to choose efficient tools and solutions to problems.
@@ryanseipp6944This has nothing to do with AI.
Backend engineer chiming in, more often than not it’s not up to you to make the decision to spend N amount of extra engineering hours to make something more efficient. In many cases the product owner is the in charge of making that decision for us, based of time and cost it’s almost never the case where it makes sense. Both in terms of time-to-market of a feature or product, and the cost of engineers compared to the meager savings in hardware provisioning. Throwing more compute at a problem is almost always cheaper than chasing that last 10% or whatever of extra performance
@@skaylab agreed for after-the-fact improvements. With the right monitoring and data, it can be relatively quick and easy to make 10-30% improvements for a particular flow in an existing product. At least, that's been my experience.
When you are so old that you remember data centers signing cage lease contracts without ever specifying anything about electricity costs.
I remember a year ago hearing someone at a big chipmaker saying "nowadays many do not pay for floor space but for power instead" and all I thought was... it has been like that for at least a decade :-)
I would love some full in-depth review about smaller 1kw-3kw vertiv ups. I normally buy Eaton and only don't get Vertiv because i have no idea about other players. Full soup to nuts reviews about some UPS would be great as Eaton is so expensive these days, that I'm my opinion has not changed in 15 years and when you are paying 4k for a UPS for a 8k server it just seems a little of of balance.
We might start doing more of these, but our traffic on those reviews is much lower than our average
Unless you're completely out of rack space then you are not buying top end servers. You are buying at the sweet spot where cost meets performance. You are also connecting to the network using multiple 10 gig connections, not 25 or 40.
Fun video! Liquid cooling, whether it's on chip, immersion, whatever, is the future of high density compute. Has to be. Recapturing the heat load and utilizing it in the building makes too much sense ...or dollars and cents. NTM, the ability to throw cooling towers on DC's and let millions of gallons of water evaporate to the ether is inefficient and bad resource mgt. These data center owners need to show a respectable PUE and show they're good stewards of a communities resources.
the power requirements are getting rediculous, its about time that the chip company and pc manufactures comply with energy efficiency requirements of less than 500w per system and better by 2030 etc
We have a big AMD Siena series on the main site and we are working on getting more Xeon 6E 1P content
@@ram64man The problem is that "per system" isn't really a helpful metric. With that, you'd just end up getting lots of "systems" with a single GPU in them.
The problem is that people are using them for wasteful things. Which is a political problem, not a technical one.
Consumer systems are practically limited in total powerdraw, and we are around that limit today. When a normal socket/fuse can not suppoly enough power, the number of consumers willing to buy the system will drop significant. When "normal people" can not even use the top end, it is no longer a selling point for the lower end systems either.
@@ssu7653 I'm sorry, but no. It may sound absolutely insane to you (because it absolutely is) but you can get 2800W consumer ATX power supplies. For, I dunno, welding GPU diodes together at the New World main menu or something, I guess. And yes, you would need a fancier than normal outlet to plug in into to get the full rated load.
(And I've seen 2800W redundant server supplies too, some of the heaviest feeling Intel Common Redundant Power Supply Standard units I ever held in my own hands)
Do actual consumers use them? No, they are a very very small quantity of the installed base of consumer machines (which are typically under 300W draw), so I concede there.
But as far as consumer-branded-availability? Yes, a number of crazies are trying to push them as useful to gamers (which they are not).
I know what you meant, but what you said had a different semantic meaning than the intent I conceded the point to.
If servers are guzzling more power they are also doing more work nowadays.
120KW rack is only at 70% utilisation I assume that rack liquid cooling at the server level.
Wondering when we switch to 24V or 48V or higher direct to the chip, 12V seems like it will soon be a limiting factor.
Dell's XE Ai server line up all have 4x 2500 watt PSUs... weeeeee
Yea nowhere near enough
@@ServeTheHomeVideothe H100 8 way DGX boxes have six 3kW PSUs
On the power side - there was some announcement, that Microsoft is getting its dedicated power plant for a bigger and power hungry DC. Is that the 100MW future project you talk about? :)
The GPUs are the primary available options, but as the ARM CPUs with built in NPUs are expanding the power draw should drop once they start scaling it out, or at least increase performance at the same power draw. Also the increase in M.2 based NPUs that do not need a dedicated graphics card can help scale things out as well. Even then, most of these NPUs are still only sitting in the 3-50 TOPS range, versus the mid and high range RTX cards can handle 100-1000 TOPS. With a similar ROCm enabled workload, the mid and high end RX7000 series are still within the 150-800 TOPS range.
Does anyone have graphs regarding the rise in compute power requirements as the ROM compute is scaled, eg Aschenbrenner's recent paper called "Situational Awareness" on the state and future trends of AI / LLMs
When thinking about the power consumption of these ultra-powerful servers, you should probably add the cost of cooling these beasts to the overall power consumption calculation.
Next time my family complain about leaving lights on, im going to show them this video
I mean, that's not going to affect your electricity bill, you should still turn your lights off if you aren't using them. lol
companies earn money while using server energy.
@@JeckNoTree i do hope you got the satire in my comment
Tesla's gravy is going to be its energy division more than self driving.
Fair
Any semi-modern facility I've used allocates around 3KW on the low side and despite not running any particularly beefy servers we really feel the pinch.
Worth noting I'm in telecom so we're running mostly DC equipment. It always feels sort of strange to me seeing enterprise stuff use a plain UPS since you lose a ton of your runtime from the inefficiency of converting the DC battery to AC output. I do more or less understand why DC is less common in the enterprise but it's an interesting deal
That actually seems really strange to let that inefficiency go forward. Maybe it has to do with DC UPS not being up to snuff?
@@motmontheinternetSo you don’t actually have a UPS on DC power plants. You have a rectifier which is attached to battery banks which is different in some fairly fundamental ways. One of the bigger advantages is that it’s fairly easy to add more battery banks if you have the space - if you ever get a chance to look inside the cabinets at the base of a cell tower you’ll likely see at least one crammed to the brim with batteries for this purpose.
The reason it isn’t used in more enterprise spaces is that the copper cabling is fairly expensive and terminations etc require specialized skills
That’s why so important to consume more power versus licensing VMware for a single year
Here's a better example to illustrate how much power these systems consume: the average US household consumes on average about 1.3kwh. So one of those NVIDIA racks is equivalent to about 100 US household's power.
Just by looking the nominal current one of those GPU chips take, back in the day we would call it a short circuit.
That's absolute madness, just like the power consumption of cryptocurrencies.
Sooo, no mention of the data center's massive fresh water usage? And the ramifications of it, while the whole country is dealing with water scarcities?
The whole country isn't dealing with water scarcities? Like it's very much a regional thing.
Just to give you some idea, I know most of the new AI data centers here in Arizona are being designed for net water neutral operation. Also the fuel cells make water
Please do note how politically charged the word "fresh" is in fresh water usage. What do you expect to run those servers on? Murky shit water "fresh" from the sewage systems? The radiator in your car also uses "fresh water" (distilled) and maybe with a tiny bit of chemicals to reduce rust, but in essence, it is fresh water. The radiators inside your house that run off of central heating (if applicable), your toilet and also the shower use "fresh" water to do their thing.
I know that there were some cases where actual fresh water from e.g. rivers was being sucked up and disposed of by some DC's and that this is ridiculous, but to only focus on the water aspect in a world full of dwindling resources is a bit narrow minded IMHO.
@@masterTigress96 your grievance is selective at best, that water could have been used by the locals regardless. To your point, had they taken murky sewage water, purified it, then used it. We wouldn't be having this conversation. But to say people have such a small attention span that we can't focus on more than one thing, and that this should not be what we focus on is rather dismissive. Because you can literally say that about any topic, and then nothing gets done.
@@juanmacias5922 That is true but they can filter out *any* water before "serving" it to the people. They can filter water from the ocean before you flush it down the toilet, and they can clean it up again before disposing of it, maybe even recycling it.
The problem is cost and of course the ancient old turn-pee-into-water-again machine has the same failure rate as the ice cream machines at McDonalds.
It comes down to cost, and we only serve so much clean, usable water, to so many people, for a given price.
The question is then, if you think that the locals should take priority, how are you going to convince the DC's to agree to paying more for water that needs to be filtered or brought in some other way?
The investors/stake holders of the DC's will lobby and say they are more important then the locals, and the locals will do the opposite.
If we had an infinite amount of clean water and no worries of it running out or having droughts or water shortages, we wouldn't be having this conversation.
So the question is, who is going to be the one that will "feast" on the shitsandwich of paying for more expensive water?
The same can be said about electricity, locals could benefit from cheap electricity from a local power source (solar, wind, nuclear, etc). but that is now also being redirected to the DC's.
The list goes on an on, there is a major lack of any sort of real jobs being created by DC's nor are they kept in the area after the DC is build.
My point was that just the water aspect of it is just one tiny portion of a whole list of bigger issues that (could be) attributed to datacenters.
Nature: "We'll see how long this joke will last."
Vertiv out of Ohio are great then
and you went to Columbus site
Delaware facility would make good money off ppl like us just doing tours
whoa, that's in Delaware, OH? that's where i live!
Yes!
AI is a waste of money - people who uses it for any kind of business will find out sooner or later, that the quality of their work is going downhill.
😂
This just depresses me. We should be working towards using less power, computers are already fast enough for most day to day things people wanna do - send messages, photos, videos and look at photos, videos, documents. Instead of focusing on making all those things easier to use and more power efficient we choose to make the world unlivable and for what? A glorified search engine that can make uncanny valley style images?
It's madness. This power guzzling approach to servers is not sustainable. It's only a matter of time before politicians and and legislators begin demanding vendor stop this nonsense of designing such wasteful hardware.
They have a serious problem legislatin away from this. Either Intel/AMD/Nvdia are at the leading edge, or we hand that over to China/Russia.
It is that simple, stay on top or give "the enemy" the upper hand as far as technology goes. This will VERY fast give equal more advanced military capability and loosing THAT edge would likely be the end of USA (as we know it)
It is completely fine under the guise of net zero and sustainability. Aka their words mean nothing.
@@ssu7653Russia? They don’t have anything close to Intel/AMD/Nvidia, and probably never will how things are going.
And while China is trying to compete, their strategy is mainly to copy and reverse engineer, not being a technology leader (also they rely on chip fabs in Taiwan).
Not the politicians. They are stupid and make everything worse. These companies can figure it out themselves.
I think you're missing the point somewhat. A large part of the reason the power demand per system has gotten so high is because they've been able to cram WAY more capability into each system. If each new 1kW server replaces 4 500W servers then your efficiency has gotten a lot better. But off course, you can reasonably argue that the software running on these servers has gotten excessively compute hungry compared to the benefit it provides.
It is clear that data centers are both greedy in electrical power and large producers of heat, so much so that near my home a university data center has not obtained the right to be built if and only if it involves the recovery of its heat produced to become the central heating of the neighborhood's residential buildings.
another solution for data centers that is a bit extreme and little seen on YT pushes the concept of liquid cooling to the grand master level!
no more tubing cluttering the box to distribute the liquid to just the right places
no, the grand master level is the submerged data center!!
the motherboard and all its accessories (duly adapted to such a situation) find themselves drowned in a carefully chosen coolant (like that of your car - not a wise choice - but you get the idea)
Perhaps several Motherboards will end up immersed in a tub of mineral oil. There goes the case, case fans, CPU cooler, GPU cooler, Network cards and several heat-sinks. Water boils at a temperature too low for high end parts. Being able to pump oil via radiators, or heat pump tech may be an answer.
NIC's shouldn't need their own OS. But in all fairness, BOTH of the cards you held up run their own OS. (the connect-x card has onboard firmware. possibly vxworks based, I've never disassembled the firmware.) Much older cards also have "microcode", but most aren't upgradable. (I recall updating the microcode on some broadcom embedded nics on supermicro boards - they behaved badly without the IPMI card.)
There is a pretty big difference between running CX microcode (that the BF-3 has too) and the BF-3 running ESXio or Ubuntu.
@@ServeTheHomeVideo Indeed. At that point, you have a "co-processor", not a NIC.
I hope we’re building power plants.
I wish he added the info that currently, the most important thing for a new Datacenter is the proximity to a nuclear reactor since they are a source for consistently high amounts of energy. Especially since most of the world bought into the hype of solar and wind which are the opposite of what a datacenter needs.
I do appreciate that he highlighted the fact that they do use solar and stuff to cover their energy costs tho
I frankly don’t understand why the choice was made that it was okay to break the power envelope to get more performance. Part of this is probably intel’s stagnation pre-AMD that broke the dam because it was the only way to increase performance for several gens, but how were data centers okay with just eating that cost for relatively less performance per watt?
Even if it is less performance per watt, i am 99% sure the performance per cost is WAY up.
Take the total system cost including software, cooling, running cost and space over the expected lifetime then compare it to how much work is done.
Performance is WAY cheaper today than it was 10 years ago.
Performance per watt has actually increased !
@@prashanthb6521 I know it has increased, but the incremental gains have slowed significantly it seems. I guess my question is, when data centers realized they would have to start massively scaling up their power supply and cooling solutions, at high cost, for let's say the same % gain in performance YoY, why didn't they balk?
@@TheShortStory I think this video is making people look at this issue in a wrong way. There is no serious problem here. Assume a DC had 1000 old servers running. Now it requires only 600 new servers to run the same software load. Also these 600 new servers consume the same power as those old 1000 ones. Now every rack in that DC is 40% empty, so they decide to fill it up with more new servers, raising the total power consumption of that DC. If they dont want to spend on new power supply and cooling then the DC operators should run only 600 servers. They would still be earning the same revenue. On top of that they would be spending less on maintenance as they are operating less machines.
@@prashanthb6521 I’m sure the business case makes sense at the end of the day, because otherwise this transition to higher-power devices wouldn’t have happened. But someone is paying that price, whether it’s VC or consumers or data center operators through their margins. Yes efficiency increases but if a data center requires a major modification that is a cost for both the environment and _somebody_. I wonder if there really was demand for all that added performance before the AI craze began
dunno, that power supply looks physically smaller than my now "ancient" supermicro power supply but packs MORE of a punch (1KW vs 2KW)😳
We're going to need a huge amount of hamsters to run each server farm or that power plants and server farms are going to be combined
The real question I have is, how many servers do I need to run to heat my house in the winter.
Home space for cloud computing. I can charge Great rates in the winter.
There are businesses doing this. Even data centers making hot water/ steam loops for heat
Those panels look tired af. The fuel cell is awesome, don’t get me wrong, but is it green? It may have its place as energy storage. Because that’s what it is, but I strongly doubt that hydrogen is “green.” Hydrogen might legitimately have benefits over other storage solutions in this application. It looked like it was all truck portable. Temporary/ portable backup generation is a niche I can see hydrogen filling. Thing is, just like batteries, hydrogen on its own is only slightly greener than the alternative. It’s the energy that fills it (tank or battery) that makes it green.
Let’s forget “AI.” My guess is it will end up raising power budgets substantially permanently but not nearly to the degree we are seeing in the bubble.
If we could cut out just half the tracking & ad related power overhead, how much could that theoretically save? In other words, how much overall compute is spent on surveillance capitalism? Reckon we could (theoretically) legislate away a quarter to half of ad tracking BS? How much could that lower the power bill?
another channel I susbcribe to succumbing to genAI thumbnails ;-; they're so bad
Agreed. We have 3 in rotation. I had no idea how to thumbnail this one
Can i report just the thumbnail?
They better start making more wind turbines
No to much changes when you talk about number of cpus and dimms per server. About 15 years ago, the servers were the same using 450w power supply. I used about 6KW power per rack.
The processing power increased and so the power consumption.
I wouldn't feel comfortable putting large lithium batteries anywhere near multi-million dollar clusters.
You can see the different containment and spacing of the lithium batteries
@@ServeTheHomeVideo Yep, that help to limit thermal runaway propagation. Gaseous fire suppression is only partially effective against lithium ion batteries. The FAA did some testing, the results are online. I'd feel more comfortable with a thick concrete wall between those batteries and anything important. I could store gasoline in a DC and manage the risks but at the end of the day it's better to have the gasoline somewhere else.
@@ServeTheHomeVideoth-cam.com/video/C3m6Md4WBVg/w-d-xo.htmlfeature=shared
Is this an entire video to say "Modern servers now typically include GPUs, which was not the case historically"?
You talk like if 1MW was a really lot.
For comparison:
One German intercity train can take over 11MW (ICE 4)
Or an ordinary Tramway takes around 500kW
I think the point was 1MW was small with modern AI systems being 100x that, and likely 300x or more than that next year
to bad they can't use the heat generated and convert it back into power. also, it would be better if they just used dc power, especially when using batteries and solar. just fyi, elon(tesla) mentioned their data center will be using 500mw
The Matrix was prescient. Use "Human' as battery! That _is_ the one thing the world will never run out because if it actually happens, power will be the last of our worry.
Soon we'll see mini nuclear reactors being built alongside of new data centers. 🤣 That's just crazy. They need to build solar / wind farms to keep their carbon footprint down.
Yes. Those talks are already happening in places like Virginia.
i have a dell r630 and i use the 500w power supplies because i have only 2 sata ssd for os only and 8 nvme as storage for the LLM and Image models and it running 12 dim of 4rx32gb 2300mhz ram and two xeon e5 2699v4. i am idling on about 130w and max out at 450w. also it is only one power supply used at the time. i know this is risky but it is done for power evfessency also i dont care if the server goes down and restart with the second psu if psu 1 fails. everything is automated and volatile nothing is critical on this system. it is used for LLM and image generation it has no other purpose nothing is saved on this system. every image that is created and i want to saved is on my nas .
That is great sizing
@@ServeTheHomeVideo theoretically, the server can use more power, but in real-world conditions more than 450w was not possible for me at 100% load with a LLM or ai image creation. i dont know how much more the psu can handle or if the dell power management system switches the second psu also on. in the ipmi is only 2 option for the power management. one does distribute the power evenly, the other sets a power supply priority. but what happen in the background is not clear communicated in the ipmi and the lastes user manual that is public available is so old it even does not mention the use of the v4 cpus or 128gb ddr4 dims. i have the idrac 8 and a enterprise license luckily so some things are possible but sins i dont work in the server space or have anything to with IT systems professionally.
i write programs and drivers as my hobby so i have a idea how this works but no prove to it.
if it uses the second psu if power goes over the 495W that one psu can provude. nice no down time but if not is this also ok and i change the setting and use both.