Really cool to see how the industry is trying to standardize on using PCIe for everything and AMD being ready for it with having so many in such a flexible design.
@@johngrave5554 It is very standard in slots. When you start talking about backplanes, standardization of new form factors for server use, etc. there are many more question marks and engineering decisions. There is also generational learning curves. Think about bandwidth issues with PCIe SSDs in RAID being processed through the CPU(s). 160 lanes of PCIe in x4 SSDs is 40 drives with parallel access. That's quite the firehose to drink from.
07:55 AMD is doing it so friggin' smartly :O AMD right now is truly engineer led company and it shows all around. I am so glad we've been AMD shop even through the thick, and now with EPyC CPUs it's reaaaally starting to show on bottomline as well. We can't build them fast enough :)
Well on their last attempt doing so they almost went bankrupt. They arrogantly joked about Intel and Core architecture because they used chiplets while their multicore was a real single chip solution.And they failed terrible after the Athlon. Hahaha so ironic they took switched with Intel the base architecture and now they are back successful while Intel is in a dead end.
@@llothar68 Intel is pathetic. They couldn't even retain the number one spot in just one segment. Not server, laptop, desktop, nada. Just a bunch of security vulnerable overpriced chips.
@@xWood4000 Really want Threadripper Pro to release for the diy market. 16 cores is all I need, but I'd like those lanes for scaling up NVMe as I need it (which will hopefully and surely become cheaper in the future) and more focused on the client side than Epyc which is officially not even supported
@@xWood4000 All I really want TRpro (WRX80) or the rumored TRX80 for is the 16 core part with extra lanes and more bandwidth, which doesn't exist on regular TRX40 so far but definitely could and I'd be fine with "just" 64 lanes and 4 channels of memory. ( Suppose that stuff will have to wait for the SP5 socket next year.
I just love that new amd processors from a hardware perspective... Found even more interesting than arm with it's almost infinity options.. P.S.: Love that kind of videos, also Wendell from L1T gets also super technical which I found recently I'm just addicted to :)) Great job +1!
hey. shouldnt 2x16 lanes per socket to 2 other sockets work well tho, for a total of 3 sockets and 3x4x16 free pcie lanes? given that naples already had solved network loops, it seems reasonable to me to do a triangle socket topology using rome?
Technically even 2x XGMI link socket-to-socket (192x available lanes in 2P) works, but that is not supported by AMD. Only 3x and 4x XGMI links. Going to 3 sockets adds additional complexity.
@@ServeTheHomeVideo so, 2 sets 2x links on each io-die, one going to each socket is what im saying. doesnt seem that complex imho, things like broadcasting cache coherency is just sending to both. past 3 seems insane though, needing n!/2!(n-2)! connections for a point to point design like that.
I cant wait in 7 years i'll be able to buy one of these dual socket 128 core Dell servers, like when i bought a top of the line, dual Xeon E5649 12 core 128GB-DDR3 24x2.5 hot swap Dell Poweredge C2100 for $210, 7 years after it was released for i'm guessing $35-65K
On hard drives, we are using the 24x NVMe backplane so we do not have the 3.5" HDD or 2.5" SATA/SAS HDD backplane in here. We have seen Kioxia CM6/ CD6 drives link on this system at PCIe Gen4 speeds that we got from Kioxia instead of Dell. Using third party drives will not be supported by Dell so there is another layer of concern there.
Is there a reason Epyc parts couldn't use HBM2e as L4? A CPU with the memory bandwidth of a GPU would be nuts for some workloads, CPU cores have progressed to the point where memory bandwidth is becoming a more and more widespread bottleneck. If the latency of HBM2e as L4 causes regressions for some workloads the option to disable it would be useful, on a per-process basis instead of in the bios would be very nice. Extending the "Infinity Cache" concept of smartly handling cache to increase performance (by determining which data performs worse with L4 and bypassing L4 when appropriate) would take a lot of fine-tuning effort away from the user, if such a thing could be made to work well.
No real reason other than offering cost/ complexity. AMD also made Rome to compete with Ice Lake Xeon so it did not need this type of feature. We will see more co-packaged memory in the future. Disaggregating the CCDs makes this easier.
@@ServeTheHomeVideo thanks. Actually I was missing the fact that Dell seems to offer quite a few gen4 U.2 drives already. However since the Dell website is a pain in the ass it’s hard to find their specs, (and some have ridiculous prices in the New Zealand shop, for what reasons at all Dell wants to show them to me). Btw what’s the magic sauce to actually select the server in the configuration? It seems to insist on PERC config for the two models which are shown to me.
@@berndeckenfels It took me 3-4 minutes of configuring to get to the screenshot and I was working off the Dell BOM I had for the server we are reviewing.
P.S.: Is that a proprietary cable? Doesn't look like octalink (or octolink, I never know right) cable to me..? Also, are octalink already pci-e 4.0 (x8)compatible?
Do you mean the XGMI cable? www.servethehome.com/dell-and-amd-showcase-future-of-servers-160-pcie-lane-design/dell-emc-poweredge-r7525-xgmi-cables-to-nvme-backplane/ I'm not sure if AMD has standardized a cable for XGMI links.
@@Zizzily yeah, XGMI in that application, but originally I meant OCulink (I always forget this and mix with octalink)... Actually I just found that many use different because there are different suitable for pci-e protocol, but I mostly saw oculink, probably because it has sas capability as well :SS
The R7525 is a fantastic host. I'm running a single processor version with a mixed SAS/SATA/NVME backplane for a homelab. When you factor in all the vouchers, it becomes very competitive against second hand hosts, with tons of expandability. Looking forward to the coming videos.
what i noticed he did not talk about is the 1 USB 2 and the 1 VGA port on the front. USB 2 and VGA don't really take a lot nor are they really used in rack mount servers but for them to be functional you need them to be hooked into the CPU some how either by having their own daughters board that will take the signals to the motherboards CPU or by taking a break off from the CPU PCIe lanes for those 2 ports. Either way you do need at least 1 lane to dived between them IF they are functional ports and not legacy ports that aren't hooked into anything.
Generally on servers, especially EPYC 7002 servers, these are tied to the BMC. The BMC utilizes a x1 WAFL lane to the CPU that is distinct from the main flexible I/O blocks. We covered this in the previous article on the STH main site.
It actually is even a more acute problem on this type of system. You start running into memory bandwidth limits because 24x PCIe Gen4 lanes are a very high proportion of total memory bandwidth. In addition, with less socket-to-socket bandwidth, you start needing to test these using 100/200GbE NICs on each socket but end up limited in that case by network bandwidth. Many challenges!
wireless is a shared communication medium - it is equivalent to just having 1 wire connected to everything. (and yes, you could do fancy frequency domain signalling wired, optical fibres use both frequency and phase, and does ADSL an internet connection sharing the underlying telephone line)
@@andytroo woosh. I was joking about having AMD and Dell having internal cabling for flexibility of CPU PCIe lanes instead of traces on the motherboard. But then I'd have to explain that while traces are technically wires, they're fixed in on a PCB, and explaining the joke is never funny. You're not wrong about RF though.
Sounds like Intel forgot to pay Dell this time around ;) joking aside, it's still asinine that dell is locking the EPyC CPUs to their servers only; Which means that unless Dell servers + Dell locked CPUs are suuuuuper cheap i will not touch them ... and i tend to regularly spend 4 figure sums _monthly_ on hardware ... **edit** mistakes were made. 5 figure sums each month.
No way we'd look better than you in that merch, you're a baller Patrick
Really cool to see how the industry is trying to standardize on using PCIe for everything and AMD being ready for it with having so many in such a flexible design.
Isn't PCIe pretty standard? Unless its different on the enterprise side.
too badit doesnt workhalf the time
@@johngrave5554 It is very standard in slots. When you start talking about backplanes, standardization of new form factors for server use, etc. there are many more question marks and engineering decisions. There is also generational learning curves. Think about bandwidth issues with PCIe SSDs in RAID being processed through the CPU(s). 160 lanes of PCIe in x4 SSDs is 40 drives with parallel access. That's quite the firehose to drink from.
07:55 AMD is doing it so friggin' smartly :O AMD right now is truly engineer led company and it shows all around. I am so glad we've been AMD shop even through the thick, and now with EPyC CPUs it's reaaaally starting to show on bottomline as well. We can't build them fast enough :)
Well on their last attempt doing so they almost went bankrupt. They arrogantly joked about Intel and Core architecture because they used chiplets while their multicore was a real single chip solution.And they failed terrible after the Athlon. Hahaha so ironic they took switched with Intel the base architecture and now they are back successful while Intel is in a dead end.
@@llothar68 Intel is pathetic. They couldn't even retain the number one spot in just one segment. Not server, laptop, desktop, nada. Just a bunch of security vulnerable overpriced chips.
Awesome video! Does this mean we can use PCIe Gen 4 links as lanes for EPYCs as DPUs?
Wow! Thank you! AMD would likely have you connect the extra PCIe lanes to one of their Pensando DPUs but Bluefield 2 would work as well
How much PCIe lanes?
AMD: yes
beat me to it
Unfortunately not true for Ryzen, some pcie slot or sata port always has to be disabled in order for a second M2 slot to work
@@xWood4000 Really want Threadripper Pro to release for the diy market. 16 cores is all I need, but I'd like those lanes for scaling up NVMe as I need it (which will hopefully and surely become cheaper in the future) and more focused on the client side than Epyc which is officially not even supported
@@hugevibez Yes, regular threadripper is better than ryzen but threadripper pro is really needed
@@xWood4000 All I really want TRpro (WRX80) or the rumored TRX80 for is the 16 core part with extra lanes and more bandwidth, which doesn't exist on regular TRX40 so far but definitely could and I'd be fine with "just" 64 lanes and 4 channels of memory. (
Suppose that stuff will have to wait for the SP5 socket next year.
you are a great teacher!
Ha! Thank you for the kind words.
I just love that new amd processors from a hardware perspective... Found even more interesting than arm with it's almost infinity options.. P.S.: Love that kind of videos, also Wendell from L1T gets also super technical which I found recently I'm just addicted to :)) Great job +1!
That background wood, reminds me of my home in java
We have 5 of these 7525's running a VDI cluster. I did not know this was a feature.
It does not appear to be one Dell advertises widely.
9:06 STH is making segways like LTT lol
hey. shouldnt 2x16 lanes per socket to 2 other sockets work well tho, for a total of 3 sockets and 3x4x16 free pcie lanes? given that naples already had solved network loops, it seems reasonable to me to do a triangle socket topology using rome?
Technically even 2x XGMI link socket-to-socket (192x available lanes in 2P) works, but that is not supported by AMD. Only 3x and 4x XGMI links. Going to 3 sockets adds additional complexity.
@@ServeTheHomeVideo so, 2 sets 2x links on each io-die, one going to each socket is what im saying. doesnt seem that complex imho, things like broadcasting cache coherency is just sending to both. past 3 seems insane though, needing n!/2!(n-2)! connections for a point to point design like that.
I never thought about the t-shirt tbh 😂
old macdonald had a server farm 14:21
P C B I O!
I cant wait
in 7 years i'll be able to buy one of these dual socket 128 core Dell servers, like when i bought a top of the line, dual Xeon E5649 12 core 128GB-DDR3 24x2.5 hot swap Dell Poweredge C2100 for $210, 7 years after it was released for i'm guessing $35-65K
I wonder how big the performance limitations are with this setup...
So is 192 lanes possible as well? At least theoretically?
I have seen the 192 lane designs. They exist. AMD does not support that configuration officially so most are 128 or 160 lane designs.
@@ServeTheHomeVideo very cool to be honest. Thanks for the reply :)
Is the waffle lane part of the 96 (24 x 4) lanes?
What would be a use case for this kind of setup ? VDI perhaps. But what about DR ....
NVMe RAID? :) BTW, at 1:53, the term is *inter*-socket. "Intra" is "within", "inter" is "between".
Thinking about buying this server. Do you know if you can use non dell hard drives?
It wont be covered by support if you do. Source...I work at Dell..
On hard drives, we are using the 24x NVMe backplane so we do not have the 3.5" HDD or 2.5" SATA/SAS HDD backplane in here. We have seen Kioxia CM6/ CD6 drives link on this system at PCIe Gen4 speeds that we got from Kioxia instead of Dell. Using third party drives will not be supported by Dell so there is another layer of concern there.
Is there a reason Epyc parts couldn't use HBM2e as L4? A CPU with the memory bandwidth of a GPU would be nuts for some workloads, CPU cores have progressed to the point where memory bandwidth is becoming a more and more widespread bottleneck.
If the latency of HBM2e as L4 causes regressions for some workloads the option to disable it would be useful, on a per-process basis instead of in the bios would be very nice. Extending the "Infinity Cache" concept of smartly handling cache to increase performance (by determining which data performs worse with L4 and bypassing L4 when appropriate) would take a lot of fine-tuning effort away from the user, if such a thing could be made to work well.
No real reason other than offering cost/ complexity. AMD also made Rome to compete with Ice Lake Xeon so it did not need this type of feature. We will see more co-packaged memory in the future. Disaggregating the CCDs makes this easier.
Are there U.2 drives which can do 8xGen3 and/or 4xGen4?
Hi Bernd, are you thinking of something like this? www.servethehome.com/kioxia-cm6-review-pcie-gen4-ssds-for-the-data-center/
@@ServeTheHomeVideo thanks. Actually I was missing the fact that Dell seems to offer quite a few gen4 U.2 drives already. However since the Dell website is a pain in the ass it’s hard to find their specs, (and some have ridiculous prices in the New Zealand shop, for what reasons at all Dell wants to show them to me). Btw what’s the magic sauce to actually select the server in the configuration? It seems to insist on PERC config for the two models which are shown to me.
@@berndeckenfels It took me 3-4 minutes of configuring to get to the screenshot and I was working off the Dell BOM I had for the server we are reviewing.
@@ServeTheHomeVideo maybe it’s an international restriction, do you start with the base models which use the sata chasis?
don’t those unnecessarily long cables artificially increase the latency?
P.S.: Is that a proprietary cable? Doesn't look like octalink (or octolink, I never know right) cable to me..? Also, are octalink already pci-e 4.0 (x8)compatible?
Do you mean the XGMI cable? www.servethehome.com/dell-and-amd-showcase-future-of-servers-160-pcie-lane-design/dell-emc-poweredge-r7525-xgmi-cables-to-nvme-backplane/
I'm not sure if AMD has standardized a cable for XGMI links.
@@Zizzily yeah, XGMI in that application, but originally I meant OCulink (I always forget this and mix with octalink)... Actually I just found that many use different because there are different suitable for pci-e protocol, but I mostly saw oculink, probably because it has sas capability as well :SS
The R7525 is a fantastic host. I'm running a single processor version with a mixed SAS/SATA/NVME backplane for a homelab. When you factor in all the vouchers, it becomes very competitive against second hand hosts, with tons of expandability. Looking forward to the coming videos.
what i noticed he did not talk about is the 1 USB 2 and the 1 VGA port on the front. USB 2 and VGA don't really take a lot nor are they really used in rack mount servers but for them to be functional you need them to be hooked into the CPU some how either by having their own daughters board that will take the signals to the motherboards CPU or by taking a break off from the CPU PCIe lanes for those 2 ports. Either way you do need at least 1 lane to dived between them IF they are functional ports and not legacy ports that aren't hooked into anything.
Generally on servers, especially EPYC 7002 servers, these are tied to the BMC. The BMC utilizes a x1 WAFL lane to the CPU that is distinct from the main flexible I/O blocks. We covered this in the previous article on the STH main site.
@@ServeTheHomeVideo As this was the first video of yours i saw i did not know. Thank you for responding in a way that makes sense and is informative.
Intel: Buy this, it's new.
AMD: Buy this: it's new AND improved.
Winner: AMD
can i make a 160 GPUs mining rig with it?
What’s a DPU? 🙄
th-cam.com/video/S92rdAwIuNk/w-d-xo.html
It would be great to see a storage benchmark. Maybe raid0? 😁
That might be a problem for the cpu. Even with 128cores this is gonna be an extreme load.
It actually is even a more acute problem on this type of system. You start running into memory bandwidth limits because 24x PCIe Gen4 lanes are a very high proportion of total memory bandwidth. In addition, with less socket-to-socket bandwidth, you start needing to test these using 100/200GbE NICs on each socket but end up limited in that case by network bandwidth. Many challenges!
see the LTT and Level1techs coverage on a big NVMe server. Getting the full bandwidth of all devices at the same time is very tricky
It's still weird seeing EMC on Dell serves
160 pcie lanes and vendor locked CPU's that will never work in another box
smh
Just goes to show, wired is better than wireless.
wireless is a shared communication medium - it is equivalent to just having 1 wire connected to everything. (and yes, you could do fancy frequency domain signalling wired, optical fibres use both frequency and phase, and does ADSL an internet connection sharing the underlying telephone line)
@@andytroo woosh. I was joking about having AMD and Dell having internal cabling for flexibility of CPU PCIe lanes instead of traces on the motherboard. But then I'd have to explain that while traces are technically wires, they're fixed in on a PCB, and explaining the joke is never funny.
You're not wrong about RF though.
Amd is a better choice than intel in the server category
damn, Naples really was messy.
Very. We actually have quite a few Naples generation 1P systems we use for STH hosting and in the test lab.
Sounds like Intel forgot to pay Dell this time around ;)
joking aside, it's still asinine that dell is locking the EPyC CPUs to their servers only; Which means that unless Dell servers + Dell locked CPUs are suuuuuper cheap i will not touch them ... and i tend to regularly spend 4 figure sums _monthly_ on hardware ...
**edit** mistakes were made. 5 figure sums each month.
with spending a four figure sum monthly, you only have to save up for two years to populate the NVMe slots of this server with highend storage!
10:00 scratch that, Intel did not forget to pay Dell. Just like SC who are handicapping all AMD offerings.
@@tommihommi1 Uhm, 2TB NVMe drives are under 200€ each. not too bad.
Uhm and now i realize the mistake, 5 figures monthly that is :)
@@skaltura Good luck finding a server grade PCIe 4.0 drive with decent capacity for under $2000
@@tommihommi1 lol. Sounds to me like you do not work in this field OR you work on corporations with "no regard towards budget" :)
Dell has always been Intel's bitch, no wonder these Epyc systems are hard to find ;)