Crazy Ethernet SSD Hands on with Kioxia EM6 NVMeoF SSDs

แชร์
ฝัง
  • เผยแพร่เมื่อ 7 มิ.ย. 2024
  • We go into the new Ethernet-connected SSD from Kioxia called the Kioxia EM6. This NVMe/ NVMeoF SSD uses 25GbE instead of traditional PCIe to connect to the network. We show what goes into making these work compared to traditional NVMe and SAS. We then go hands-on to show both single drives and RAID 0 configurations. This may be the future of SSDs.
    STH Main Site Article: www.servethehome.com/ethernet...
    STH Merch on Spring: the-sth-merch-shop.myteesprin...
    STH Top 5 Weekly Newsletter: eepurl.com/dryM09
    STH Forums: forums.servethehome.com
    ----------------------------------------------------------------------
    Where to Find STH
    ----------------------------------------------------------------------
    STH Forums: forums.servethehome.com
    Follow on Twitter: / servethehome
    ----------------------------------------------------------------------
    Timestamps
    ----------------------------------------------------------------------
    00:00 Introduction
    02:51 Making the Ethernet SSD Work
    07:36 Microsoft Azure SONiC Being Used
    08:20 Logging into a SSD via Telnet
    08:30 Connecting to a Single NVMeoF SSD
    10:12 RAID 0 Ethernet SSD Setup
    12:02 Kioxia EM6 and Namespaces
    13:55 Kioxia EM6 Impact
    14:55 Wrap-up
    ----------------------------------------------------------------------
    Other STH Content Mentioned in this Video
    ----------------------------------------------------------------------
    - SC21 Top 10: • Top 10 Showcases at Su...
    - What is EDSFF: • EDSFF E1 and E3 to Rep...
    - DPU v. SmartNIC - • DPU vs SmartNIC vs Exo...
    - What is a DPU - • What is a DPU - A Quic...
    - Why AMD is spending $1.9B on Pensando for DPUs: • Why AMD is Spending $1...
    - NVIDIA's 2022-2023 Data Center Plans: • NVIDIA's Crazy 2022-23...
    - AMD EPYC 7773X Milan-X Review - • Crazy! AMD's Milan-X D...
    - GB era of Server CPUs - • Why Server CPUs Are Go...
    - All about server power efficiency - • How To Save $$$ Poweri...
    - Dual AMD EPYC 8x NVIDIA A100 Server - • Top-End AI Training In...
    - Liquid cooling AMD EPYC and NVIDIA A100 Servers - • Liquid Cooling High-En...
    - SC21 Liquid Cooling - • Liquid Cooling Takes O...
    - Fungible F1 DPU - • Fungible F1 DPU Powere...
    - PhoenixNAP Data Center Tour - • A Fun Data Center Tour...
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 248

  • @blackraen
    @blackraen 2 ปีที่แล้ว +97

    As a large datacenter storage admin, this completely blew my mind. I spent probably an hour hypothesizing breathlessly with colleges about the options and possibilities this introduces, especially for highly composable infrastructure build-outs. But then I realized the hitch. What's the price tag on these drives? Adding the hardware to make these things work as network endpoints is probably not going to make the cost of NVMe flash storage go down.

    • @jacj2490
      @jacj2490 2 ปีที่แล้ว +8

      Notice they also offer an adapter meaning normal NVMe are supported. As a TOC I think it will be cheaper if you will consider the cost of Storage Server with CPUs RAM HBA NIC etc and a switch

    • @lhxperimental
      @lhxperimental 2 ปีที่แล้ว +5

      There will be an SoC that handles the network and flash storage logic. Since the requirements will be put down in a spec, the SoC and firmware can be highly optimised for the task. So it could become as cheap as consumer grade router SoC. But it has to become a industry standard to drive the economies of scale. Till then it will be expensive.

    • @Vatharian
      @Vatharian 2 ปีที่แล้ว +8

      Have you used any networked PCI-Express? In our test lab we have some PCIe-over-fabric (I won't say NVMe-oF, as there are other devices on endpoints, from accelerators to RAM tanks) and genuine PCI-Express networks. While the reach is very short (no optical stuff, as far as I know), within single aisle at best, The scalabaility and ease of reconfiguration this introduced into our systems is insane. Cost of PCI-e switches is also starting to fall, and with networked Gen4 it actually makes sense.

    • @jonathan3917
      @jonathan3917 2 ปีที่แล้ว

      Would you be able to do away with storage servers with this and just have racks of network switches? I know people are moving more towards software defined storage but cutting out the middle men seems risky to data integrity. Other that than, this blew my mind too because it seems so scalable compared to current configurations.

    • @2xKTfc
      @2xKTfc 2 ปีที่แล้ว +2

      Opens the possibility of a single DNS snafu to knock a lot of separate storage offline, rather than a few big SAN appliances. :)

  • @aterribleloss
    @aterribleloss 2 ปีที่แล้ว +43

    It would be really interesting run Ceph OSDs directly on the drives or NVMeoF adapter board. Would allow for some very interesting access with DPUs

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +19

      Yes. The DPU video coming soon (hoping next week) will shine more light on how to do this.

    • @DJ-Manuel
      @DJ-Manuel 2 ปีที่แล้ว +2

      Was thinking the same, CEPH OSDs directly would make a lot of sense 👍

    • @InIMoeK
      @InIMoeK 2 ปีที่แล้ว +1

      This has actually been done in the past with HDD's

    • @xPakrikx
      @xPakrikx 2 ปีที่แล้ว

      My comment was deleted ? hmm nice :/ Their project was non commercial. Well this is weird.

    • @itujjp
      @itujjp 2 ปีที่แล้ว

      @@xPakrikx would that be the the 504 OSD Ceph cluster blog entry from Sage back in 2016? I did wonder what happened to the He8 Microserver stuff they were testing ceph on. Would love to see it.

  • @EXOgreenMC
    @EXOgreenMC 2 ปีที่แล้ว +46

    I am in love. I cant wait till this hardware gets to a point where a pro-sumer/homelabber can mess with it!

    • @aaronchamberlain4698
      @aaronchamberlain4698 2 ปีที่แล้ว +2

      I mean..... I feel like that would be pretty far off. PCIe Gen 4 x4 (typical NVMe SSD) is rated at 8 GB/s (that's a big B as in Bytes). A 10GbE switch is small b as in bits, or about 1250 MB/s. So 10GbE is ~8x slower interface than PCIe Gen 4 x4. So really, you kinda need to be at the 100GbE scale for this to make sense, cause even a 40GbE switch is still only 6 GB/s. There are some "affordable" switches with 40GbE at the moment, but it's usually just uplink ports. Dunno. I feel like it would be a long time before it either gets retired or reaches the scale needed to be cheap to pro-sumers.

    • @JeffGeerling
      @JeffGeerling 2 ปีที่แล้ว +9

      @@aaronchamberlain4698 what about HDD over Fabric? 🤪

    • @KingLarbear
      @KingLarbear 2 ปีที่แล้ว +1

      I can't wait until a con-sumer can get this and plug and play one day, but they have a long ways to go

    • @WizardNumberNext
      @WizardNumberNext 2 ปีที่แล้ว +2

      @@JeffGeerling it is called Fibre Channel and is nothing new

    • @BobHannent
      @BobHannent 2 ปีที่แล้ว +1

      @@JeffGeerling it would be cool to see something like an interposer board made from a cheap Arm SoC which has native 1GbE and SATA, and some have PCIe as well.
      One of those sub-$20 designs would be great.
      Perhaps there will be an affordable 2.5GbE SoC soon from one of the vendors.

  • @georgehenes3808
    @georgehenes3808 2 ปีที่แล้ว +9

    I’m going to watch this three times in a row, and see if I can see how this makes the “don’t trust anything” model of data management easier and more cost-effective to execute. I don’t fancy my chances! Thank you Patrick for keeping on bringing us the crazy new things!

    • @bryansuh1985
      @bryansuh1985 2 ปีที่แล้ว +5

      Well one advantage I see is you don't have to trust a long line of sas expanders and backplanes so there's fewer devices to trust / maintain.

  • @ezforsaken
    @ezforsaken 2 ปีที่แล้ว +1

    This is crazy and awesome, thanks for the video Patrick!

  • @Maleko48
    @Maleko48 2 ปีที่แล้ว +1

    thanks for the well wishes Patrick, they started my morning off right 👍

  • @jeffjohnson9668
    @jeffjohnson9668 2 ปีที่แล้ว +2

    This is a pretty cool evolution of a storage device and goes along well with the evolution of nvnmet in the kernel. It'll be interesting to see what processing you'll be able to do on the way the SSD. If they're on the network, users are going to want trust/encryption and then they'll want to do something else. Perhaps the next video on DPUs will cover that and more!

  • @jeremybarber2837
    @jeremybarber2837 2 ปีที่แล้ว +12

    My brain is a little broken by this... in a totally good way. Excited to see what else you have in store for us.

  • @jacj2490
    @jacj2490 2 ปีที่แล้ว

    Thanks Patrick, Truly amazing. I was waiting for this review since you showed it last year. It is a great concept & I think it 'll have a huge impact on storage industry. I only wish you can do some benchmarks because latency is the main factor & I believe it 'll be minimal here since as you mentioned it so directly attached to network hence less translation
    Great job & thanks for the entire STH team

  • @JeffGeerling
    @JeffGeerling 2 ปีที่แล้ว +10

    I want more info on that adapter board... is it basically running an SoC on it that adapts PCIe to Ethernet?

    • @j0hn7r0n
      @j0hn7r0n 2 ปีที่แล้ว +3

      I was wondering the same thing. If the Pi had more than a single PCIe lanes and faster NIC, I'm sure you'd create a Pi version of this @Jeff ;)
      But yeah, nvme-cli and NVM over TCP has been available in Linux for a while now. Seems like if someone made a board that attaches directly to a NVMe drive and translates to TCP, we could DIY something similar.
      I've searched a bit for an affordable SBC with PCIe lanes + 2.5Gbe to create a poor man's version. Unfortunately they're all: unavailable, too expensive, slow NIC, PCIe 2, not enough lanes, etc.
      It looks like even an ITX / ATX board as NVMe host would be expensive, because only latest gen server CPUs or HEDT support a lot of PCIe lanes (with room for fast NIC) or the denser PCIe 4/5.

    • @prashanthb6521
      @prashanthb6521 2 ปีที่แล้ว +2

      I suspect it does exactly that.

    • @j0hn7r0n
      @j0hn7r0n 2 ปีที่แล้ว

      Maybe this? www.marvell.com/products/system-solutions/nvme-controllers.html

    • @gabrielsanchez1675
      @gabrielsanchez1675 2 ปีที่แล้ว +1

      @@j0hn7r0n in the industrial computing world there are some backplanes for video applications which have pcie switches for managing up to 14 X16 pcie slots and those are more affordable than any server with xepns for pcie lanes.
      I have installed some of those and with a core i5 or pentium G, we can connect many to the same system.

    • @project-xm3473
      @project-xm3473 2 ปีที่แล้ว

      Jeff are you planning to use this on Rasperry 4 project?🤔

  • @prashanthb6521
    @prashanthb6521 2 ปีที่แล้ว +1

    This is totally awesome tech. I think this is the future of how datacenters will work. And thanks Patrick for bringing this to us.

    • @yvettedath1510
      @yvettedath1510 2 ปีที่แล้ว

      except you won't run any datacenters anymore as company dickheads all rush towards Cloud

  • @TotesCray
    @TotesCray 2 ปีที่แล้ว +2

    Awesome!!! Super excited to see DPU accelerated NVMeoF! Any word from Kioxia on when this will be commercially available?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว

      I think if you are a big customer you can call Ingrasys (Foxconn sub) and get these. The drives are launched

  • @krattah
    @krattah 2 ปีที่แล้ว +23

    I'd really want one of these to play with at home and test out various crazy use-cases. Too bad they don't seem to be available to mere mortals. NVMe key-value mode could have some really interesting implications in setups like this.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +14

      I have been trying to get a pair for many years now

    • @kwinsch7423
      @kwinsch7423 2 ปีที่แล้ว +3

      @@ServeTheHomeVideo Would be nice, if the NVMe to NVMeoF adapter would be available for testing. There is even a test version of the Seagate X18 out there. Would be really nice to have block storage available like that.

    • @timramich
      @timramich 2 ปีที่แล้ว +1

      The type of tech that is showcased here isn't going to be available to us plebes for years, and it will be second-hand. I really don't understand the name of the channel. I would understand it if he bought and showcased stuff that is finally hitting the used market, for ya know, hobbyists at home to be looking for.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +5

      Well, STH is 13 years old at this point. The YT is

    • @timramich
      @timramich 2 ปีที่แล้ว +1

      @@ServeTheHomeVideo "Part of the idea also for homelab folks is that the stuff used at work today trickles down in 3-5 years to homelabs as it is decommissioned." Yeah, so what, people are supposed to remember content from a video from 3-5 years ago? THAT's my whole point.

  • @pkt1213
    @pkt1213 2 ปีที่แล้ว +3

    Very cool. I am picturing a lower cost adapter for SATA SSD or HDD and maybe a couple of 1 or 10 gig connectors being used as a disk shelf in a home lab. My size constraints led me to a case that I am not wild about and offloading the storage would be nice.

    • @russellzauner
      @russellzauner ปีที่แล้ว

      I'd just make an adapter for our current NVMe drives. We have super fast, super big, and super cheap NVMe drives already available to everyone - they've just wrapped them in an industrial grade carrier/package. Try buying an industrial RPi and it will blow your mind what they're asking for them - a Raspberry Pi.

  • @TheInternalNet
    @TheInternalNet 2 ปีที่แล้ว +1

    As a sys administrator. This is so so huge. I can not wait to play with this kind of tech.

  • @midnightwatchman1
    @midnightwatchman1 2 ปีที่แล้ว

    as much as we do not like this translation between server and storage disks, it also allows you to offload the management and disk IO to another box. it would interesting to see how it impacts the performance of the servers now that you have to manage the disks as well

  • @MrHack4never
    @MrHack4never 2 ปีที่แล้ว +4

    Another step toward the inevitable introduction of IPv8

    • @ligarsystm
      @ligarsystm 2 ปีที่แล้ว +2

      Ipv6 has More IPs than atoms in the known universe :P

    • @hariranormal5584
      @hariranormal5584 2 ปีที่แล้ว +1

      ​@@ligarsystm
      The way we are distrubuting them is not so efficient however

  • @tad2021
    @tad2021 2 ปีที่แล้ว +1

    I've used fibre channel before, this totally makes sense and is completely awesome.

  • @idahofur
    @idahofur 2 ปีที่แล้ว +1

    The big thing I noticed is that each unit has its own controller. Thus, moving the bottleneck to the switch. Providing you can max out the throughput of the switch. Everything else seems straight forward. Though at first I thought I saw 2.5gb and not 25/50gb. Probably due to lost of talk on 2.5gb for home stuff. :)

  • @KingLarbear
    @KingLarbear 2 ปีที่แล้ว +1

    Wow, the simplification of this server compared to others we see.... holy cow

  • @lost4468yt
    @lost4468yt 2 ปีที่แล้ว +1

    This makes so much sense. Why didn't we do this before?

  • @alfblack2
    @alfblack2 2 ปีที่แล้ว

    sweeeet. And here I am preping for ISCSI for my home lab.

  • @mentalplayground
    @mentalplayground 2 ปีที่แล้ว +1

    Future is here :) Love it!

  • @ws2940
    @ws2940 2 ปีที่แล้ว +1

    Definitely would make things a bit simpler hardware and setup wise. Definitely something to be on the look out for in the next few years.

  • @MoraFermi
    @MoraFermi 2 ปีที่แล้ว +6

    Oh and now that I thought about it for a sec: It's basically Fibre Channel under a new name, isn't it.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +3

      But Ethernet, so you just use network admins. With FC, you are usually running Ethernet + FC.

    • @I4get42
      @I4get42 2 ปีที่แล้ว +2

      Hi there! Kind of..... FC has a SCSI payload in the FC frame. NVME over Fabric wraps up an NVME payload. And here that is NVME essentially wrapped up in Infiniband over ethernet. RoCE v2 = RDMA (remote direct memory access, for Infiniband clustering by calling on memory addressing of resources across a network) wrapped up in UDP, IP, and Ethernet. This way you can make the direct memory calls to the NVME drives as if they were a local resource, but wrap it up and send it out across the ethernet network.

  • @tinfever
    @tinfever 2 ปีที่แล้ว

    Okay...that's pretty slick. I'm very keen to hear more about how one would actually use these SSDs, DPUs, or NVMEoF in production. I've always been wondering how you'd implement SSD redundancy with those. I'm assuming there isn't much compute power on each SSD so anything must be implemented on the storage consumer system. It does kind of scare me that you have to trust the storage consumer system to not accidentally screw up the disks it can access on the network. Oops...one server was acting up and so it wiped all the other namespaces on the 8 drives it was sharing...

  • @4sp3rti
    @4sp3rti 2 ปีที่แล้ว +5

    This reminds me of Coraid's EtherDrive, from 2 decades ago. That was AOE (Ata Over Ethernet) on a "pro" level.

    • @maxwellsmart3156
      @maxwellsmart3156 2 ปีที่แล้ว

      Was thinking of using AoE with a number Raspberry Pi 4s connected to inexpensive SSDs and using ZFS instead of software RAID. Unfortunately, getting RPis is not too easy now but there are cheap (if not bulky) SFF PCs. Something cheap to experiment with network storage.

    • @halbouma6720
      @halbouma6720 2 ปีที่แล้ว

      Yeah, I still use AOE myself at the data center because it can basically do what is discussed here - except you don't get as many ethernet ports lol. I wonder if they are using it in the firmware.

  • @MrRedTux
    @MrRedTux 2 ปีที่แล้ว +1

    This looks like a new variant of SATAoE (Serial ATA over Ethernet), which was a pretty cool way of inexpensively attaching network based disks directly to a host.

    • @MrDavidebond
      @MrDavidebond 2 ปีที่แล้ว

      also similar (but probably more scalable) to connecting sas drives to a sas switch to share with multiple servers.

  • @georgeashmore9420
    @georgeashmore9420 2 ปีที่แล้ว +3

    I feel that I missed something but where do the raid calculations now take place if the drive is connected directly to the network?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +4

      Host in this case. DPU in the next video we will have in this series

  • @nullify.
    @nullify. 2 ปีที่แล้ว

    So what about security? Seeing that you can telnet into the drives I'm assuming they have a embedded Linux or like some proprietary OS?

  • @produKtNZ
    @produKtNZ 2 ปีที่แล้ว

    0:16 - Those storage bays have also had SCSI and SCSI u320 and more I'm sure - unless my history is letting me down and those connection types only have with 3.5' formfactor HDD's ?

  • @petruspotgieter4561
    @petruspotgieter4561 2 ปีที่แล้ว +4

    A few years ago around 2015 the was the kinetic open storage project. It was launched with several vendors , but only Seagate made only one drive. The tech was not limited to HDD, 4TB and 1GigE, but that was unfortunately the only product. It supported openstack Swift and Ceph OSD. Maybe a Linux server running on each drive was too resource inefficient in 2015.
    Hope Kioxia makes the ethernet direct to drive work comercially this time. The entire solution with dual ethernet switch in the same chassis is apealing.

    • @that_kca
      @that_kca 2 ปีที่แล้ว

      The object storage\kv implementation in the kinetic was super useful for many workloads and gives you the ability to grown horizontally across many many disks. combined with how ceph does the hash map and replication you get some really cool capabilities.

  • @ThePopolou
    @ThePopolou 2 ปีที่แล้ว +4

    The modularity of the technology is fascinating but I'm trying to understand the implications on latency. Say you have an array spanning multiple racks with drives acting as network endpoints, I suspect the acceaa speed of the array will only be determined by the furthest endpoint. Its a toss between the network layer vs the cpu/silicon layer of traditional systems. How this compares to traditional pcie attached storage in a SAN will be interesting.

    • @russellzauner
      @russellzauner ปีที่แล้ว

      the speed of any data array appears to you as whatever the latency is of the furthest piece of data you need, regardless of how it's structured or configured.
      if your data is already where you need it, you can even break the network temporarily without interrupting the work being done.
      remember when we thought the best computer was the fastest one?

  • @fat_pigeon
    @fat_pigeon 2 ปีที่แล้ว

    How does security / access control work in this model? As a traditional server physically has exclusive access to its directly connected drives, it's a natural place to put a security boundary. With every drive as its own node, I guess you would use network security techniques like VLANs.
    Sort of relatedly, it seems like the traditional model's intermediating layer of server nodes tends to compartmentalize damage from (accidental) misconfiguration. If you're reconfiguring one node's disks you might lose the data on that node, but other nodes will be unaffected. If you're configuring the network between nodes, you might lose access temporarily, but your data is still there on each node and you can recover by reconnecting them to the network. In contrast, if all your disks are in one big pool and all the configuration is in software, what stops a misconfiguration from hosing all your disks at once? In particular, there's a general system assumption that software can assume exclusive access to direct-attached disks (excluding exotic shared-disk filesystems), contrasting with the presumption that network nodes are able to simultaneously serve multiple clients that may connect to them. If you put the disks directly on the network, you would have to be very careful that your network config guarantees mutual exclusion of access to any one disk, or else a subsequent misconfiguration would cause multiple nodes to clobber each other's data on a single disk.

  • @giornikitop5373
    @giornikitop5373 2 ปีที่แล้ว +1

    so, that little pcb acts as a micro-linux and presents the nvme to the network, right? while this is great in terms of overall management and scalability, doesn't it increase net traffic and address usage by a huge margin? but i guess that's not a problem in the datacenters, they can up the equipment like it's nothing. Something similar that i remember was the very old ataoe but probably was more similar to FC.

  • @christopherjackson2157
    @christopherjackson2157 2 ปีที่แล้ว +2

    Yea u called it. Mind is blown.

  • @JohnADoe-pg1qk
    @JohnADoe-pg1qk 2 ปีที่แล้ว +12

    Maybe a silly question, but what part in the system does the parity calculations for RAIDs in this setup?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +21

      This particular one we did not do parity since it was RAID 0. However, how we had it setup it would be in the server/workstation. In the next video in this series, we will show it being done on a DPU for full offload.

    • @fat_pigeon
      @fat_pigeon 2 ปีที่แล้ว +1

      Specifically, this was regular Linux software RAID (mdadm). You could also run ZFS over the block devices after connecting the workstation to them, or perhaps even a shared-disk file system.

  • @winhtin3420
    @winhtin3420 2 ปีที่แล้ว

    How much would that 80TB usable configuration box cost? Thanks.

  • @EverettVinzant
    @EverettVinzant 2 ปีที่แล้ว +1

    Thanks!

  • @thishandleisntavailable1
    @thishandleisntavailable1 2 ปีที่แล้ว +6

    "Honey, I accidentally granted the untrusted vlan direct access to our storage drives." - a horror film

  • @garmack12
    @garmack12 2 ปีที่แล้ว +1

    Windel for level1 just did a video about how most raid systems anymore don’t do error checking at the raid controller level. Most just wait for the drive to report data error. This must be true for this as well correct?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว

      I think Wendell was doing hardware RAID. These are more for software defined storage solutions. Also, traditional "RAID" has really been used a lot less as folks move to scale out since you have other forms of redundancy from additional copies, erasure coding, and etc. The next video in this series that we do will be using a controller, but also software so a bit different than what he was doing.

  • @keithreyes3163
    @keithreyes3163 11 หลายเดือนก่อน +1

    I'm a little late to this party but from a purely tech-based standpoint, this is amazing! With the continuous advancements in speed, multitasking, and increase in lanes that the modern CPUs have, removing any and all pitstops, layovers, and roadblocks to its ability to compute will drastically improve efficiency and workflow, as well as, reduce the amount of hardware needed to maintain that "hyperspace" workflow. Reducing power usage etc. all great news right! But after my nerdgasm past, I realized one thing. We've been giving IP addresses to resources forever but now we're going to be doing it on steroids. We are dangerously low on properly trained security individuals to monitor and maintain our current networks. What happens when we expanded our current network topologies from a couple dozen or a few hundred nodes to thousands or hundreds of thousands by giving even our HDs, GPUs, etc. IPs? What happens when someone hacks into your 22 drives simultaneously and takes the whole dataset offline or holds it for ransom? I'm all for advancement and I would never want to stop progression out of fear. But can someone please start the conversation about how we safeguard ourselves in this "hyperspace" work environment we will be advancing into?

  • @abx42
    @abx42 2 ปีที่แล้ว

    I would love to have this as a home user

  • @I4get42
    @I4get42 2 ปีที่แล้ว +3

    Hi Patrick! very interesting. Is the NVMEoF target actually on these special drives, or is it in the Marvel board? It seems like it would make a big difference for what a drive replacement would look like (configuration and vendor lock-in). Also, curious: Did Kioxia talk about the RoCE v2 prioritization? I'm hoping that the drives mark the traffic with DSCP, and the integrated switch defaults to trust/ prioritize DSCP (not that there'd be a lot of other traffic on it). Lastly: Ya, the DPU feels like the most drop-in solution to me. Then you'd ideally have it manage the disks with ZFS, or something like it, and present block devices to the hypervisor on the server it is sitting in. or maybe even a vmWare on the DPU solution for next level vSAN?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +4

      In the drives. The Marvell board shown is to convert standard NVMe drives

  • @Codiac300
    @Codiac300 2 ปีที่แล้ว +1

    I see they taking the all-ip thing to the next level

  • @ForestNorgrove
    @ForestNorgrove 2 ปีที่แล้ว

    Brantley Coile was doing this almost 20 years ago with ATA over Ethernet, deja vu all over again and good to see!

  • @varno
    @varno 2 ปีที่แล้ว

    I do wonder however if 25Gbps is fast enough given that pcie gen 5 can do that on a single channel. But the configurability does seem worthwhile, and given that PCIe uses the same series tech as electrical high speed Ethernet it does seem worthwhile, and it should be possible to have one controller that can do both 100g Ethernet and pci in the same controller silicon.

  • @ralmslb
    @ralmslb 2 ปีที่แล้ว +2

    oh man, this is amazing!
    I wish I was rich to be able to afford this as my home NAS :)

  • @spencerj.wang-marceau4822
    @spencerj.wang-marceau4822 2 ปีที่แล้ว +1

    NVMeoF's nvme-cli feels a lot like the targetcli/iscsiadm cli tools merged into one.

  • @System0Error0Message
    @System0Error0Message 2 ปีที่แล้ว +1

    this is gonna be good for ceph

  • @dawolfsdenyo
    @dawolfsdenyo 2 ปีที่แล้ว +1

    Are they also looking @ adding spinning rust drive switches as well, for the mass storage needs that can be lower tier storage with hot data on the nvme switchs? Coming from the age of being some of the first users of NAS and SAN products back in the 90s and spending the next 20 years in enterprise storage and enterprise architecture, this new topology is sexy as hell

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +1

      Hopefully will discuss that this week if the DPU piece gets out

    • @dawolfsdenyo
      @dawolfsdenyo 2 ปีที่แล้ว

      @@ServeTheHomeVideo Looking forward to that, and everything else you bring out. STH the site has been a constant joy and so have your videos over the years! Thanks for these!

  • @tuxlovesyou
    @tuxlovesyou 2 ปีที่แล้ว

    Does this solve the problems with concurrency in protocols like iSCSI?
    Nevermind, I think I answered my own question. It looks like you'd need to use a filesystem or DPU that supports mounting across multiple machines

  • @__--JY-Moe--__
    @__--JY-Moe--__ 2 ปีที่แล้ว +1

    why do I see U as ten, and were standing in the middle of a sand lot. talking about new stuff!🤣.....this is very helpful when gathering ideas, to assemble the latest N greatest systems!! super-duper! great ideas! thanks Patrick! accessing data, right from the storage. will really cut down on overhead latency!!! nice! I can't believe they didn't make plastic skid covers though! it's so easy 2 loose ec's, over usage time! good luck!

  • @AndrewFrink
    @AndrewFrink 2 ปีที่แล้ว

    I'm lost, how do you keep multiple things from writing to the same part of the same drive at the same time (or separate times?)?

    • @_TbT_
      @_TbT_ 2 ปีที่แล้ว

      The same way other dual interface hard disks (SAS) do it.

  • @yiannos3009
    @yiannos3009 2 ปีที่แล้ว +1

    Is there some way to manage array ownership in a distributed manner? In other words, is it possible to create an array on one machine such that all other storage clients on the network know that disks 1...n belong to array A?

    • @yiannos3009
      @yiannos3009 2 ปีที่แล้ว

      I should clarify: know that disks 1...n belong to array A even if the array is not mounted, so that reallocation to another array would be prevented. Cool vid and tech btw, thanks!

  • @_TbT_
    @_TbT_ 2 ปีที่แล้ว

    Definitely mind blown.

  • @jurepecar9092
    @jurepecar9092 2 ปีที่แล้ว +3

    This will push RDMA / RoCE as a hard requirement on ethernet networks. Fun times ahead ...

  • @DrivingWithJake
    @DrivingWithJake 2 ปีที่แล้ว

    It's interesting however, I wonder how it would really be in the data center world. We do a lot of high storage types of systems linked up with 40/100g ports. It is fun to dream of the ideas and usage for it.

    • @russellzauner
      @russellzauner ปีที่แล้ว +1

      it's going to erode the market for gigantic monolithic building sized data centers because it facilitates distributed computing.

  • @memyself879
    @memyself879 2 ปีที่แล้ว +1

    Hi Patrick, why aren't EDSFF drives taking over by now?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +1

      No Genoa/ SPR yet. But in unit volumes they are growing but a huge amount

  • @TheNorthRemember
    @TheNorthRemember ปีที่แล้ว +2

    can someone pls tell me use-cases for this SSD ?

  • @stuartlunsford7556
    @stuartlunsford7556 2 ปีที่แล้ว +2

    How was this not already a thing?? The instant composable infrastructure came about, this should have been there...holy crap lol.

    • @shodanxx
      @shodanxx 2 ปีที่แล้ว

      I just started learning about 10g/100g Ethernet and SFP/qsfp and I was like why don't they make a 100gb qsfp m.2 slot, why is networking such an overengineered underperforming stagnant mess !

  • @mikeoxlong4043
    @mikeoxlong4043 2 ปีที่แล้ว

    If i connect this to my router do i get a super fast nas ?

  • @guy_autordie
    @guy_autordie 2 ปีที่แล้ว +7

    So the end line is having a disk-as-a-node, with dpu and interface. You make a configuration and add, as needed new chassis-nodes.
    The only difference I see is the client machine asking directly to the drive/array the data. You still need some compute somewhere to handle that (the dpus). Therefore, it's still a computer with a bunch of drives.
    Still, for me, it's like the size of the universe, it's hard to conceive.
    Maybe it will make more sense with the second video.

  • @MoraFermi
    @MoraFermi 2 ปีที่แล้ว +5

    1. That sad feeling when all of technologies mentioned as "ancient" are still fairly new to you...
    2. Can you install Ceph OSD on these drives?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +5

      We were not running stacks on the drives themselves other than NVMeoF, but you get the idea of where this is going. Next up, we will have a DPU version of this.

    • @novafire99
      @novafire99 2 ปีที่แล้ว +3

      That was my first thought, this could be really cool if it could turn into a Ceph cluster of drives.

    • @AndrewMerts
      @AndrewMerts 2 ปีที่แล้ว

      @@novafire99 Ceph actually did have an experimental drive in partnership with WD that ran the OSDs on the drives themselves and used 2.5G ethernet (1/10th what these NVMe drives do but it's HDD so... shrug). It was a really cool idea but managing OSDs and Ceph on the individual drives is something that's definitely a bit clunky because now you need extra ram and compute on each drive controller board. You're not eliminating those extra layers as far as the inefficiencies are concerned so much as just moving the microservers running your OSDs onto the same board. With NVMe-oF direct to the drives instead of adding a heavyweight Linux daemon to each drive you're only adding a network stack and an embedded Linux host mostly as a control plane and your data plane can still offload the bulk of the work.
      Aside from the reduction in processing, your interface is now much simpler. It's NVMe-oF, thats a much smaller jump to bridge from NVMe to NVMe-oF than it is to bridge from SATA to Ceph OSD. Yes it's still having to deal with authentication, encryption, session management, etc. and you can expect needing more FW updates but nothing like having to manage a bunch of Linux servers in your Ceph cluster. Having that logical separation with a clean, stable API avoids the added complexity of combining higher level storage cluster stuff in Ceph with lower level drives.

  • @strongium9900
    @strongium9900 2 ปีที่แล้ว

    Cost brother. How much would this cost. Especially for a home user

  • @0xEmmy
    @0xEmmy 2 ปีที่แล้ว

    So what I'm hearing, is that each NVMe drive is a near-zero-overhead ethernet-native NAS block device (as opposed to a limited component requiring a direct CPU attachment), and then you'd run the filesystem either on a dedicated, more traditional "heavy" NAS, or on the client itself.
    I wonder what kind of innovations this will enable elsewhere in the system. Maybe it'll be easier to design purpose-specific RAID accelerators (ideally compatible with a standard format like ZFS), once the drives aren't on the same PCIe bus. Maybe the individual parts of a modern filesystem (drive switching, parity calcs, caching, and file->block associations) could be separated into purpose-specific hardware modules.
    Or, maybe consumer operating systems start supporting NVMe-over-WiFi/Ethernet. Maybe NVMe gains network discovery features. Maybe drives with gigabit ports start showing up marketed to consumers, cheaper to use as a NAS directly than building an entire server around an NVMe-over-PCIe drive. (A single controller chip will probably under-cut even a Raspberry Pi, once optimized for cost.)

    • @movax20h
      @movax20h 2 ปีที่แล้ว

      octet33
      octet33
      2 days ago
      So what I'm hearing, is that each NVMe drive is a near-zero-overhead ethernet-native NAS block device (as opposed to a limited component requiring a direct CPU attachment), and then you'd run the filesystem either on a dedicated, more traditional "heavy" NAS, or on the client itself.
      I wonder what kind of innovations this will enable elsewhere in the system. Maybe it'll be easier to design purpose-specific RAID accelerators (ideally compatible with a standard format like ZFS), once the drives aren't on the same PCIe bus. Maybe the individual parts of a modern filesystem (drive switching, parity calcs, caching, and file->block associations) could be separated into purpose-specific hardware modules.
      > Or, maybe consumer operating systems start supporting NVMe-over-WiFi/Ethernet.
      Already possible.
      > Maybe NVMe gains network discovery features.
      Already possible.
      > Maybe drives with gigabit ports start showing up marketed to consumers,
      1Gbps is way too slow. You are putting exensive SSD that can do 500-4000GB/s (even cheapo ssds can do it), and will be limited to 120MB/s with worse latencies too. 10Gbps minimum for it to be useful.

  • @CoolFire666
    @CoolFire666 2 ปีที่แล้ว +1

    How do you manage security and access control on a setup like this?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว

      The DPU provider does, not the server provider

  • @suchness18
    @suchness18 2 ปีที่แล้ว +1

    Bit confused about the nomenclature if this is using ip addresses wouldn't it be nvme over ip? not nvme over ethernet.

    • @popcorny007
      @popcorny007 2 ปีที่แล้ว +3

      The underlying difference isn't that it uses IP, it's that NVMe is not exposed outside of the disk.
      It's more like: NVMe over Ethernet, therefore a MAC address is the lowest OSI layer for a directly connected device (ie. the switch chip) to access it.
      As opposed to: NVMe over PCI, therefore a PCI address is the lowest OSI layer for a directly connected device (ie. the CPU) to access it.
      No need to jump to OSI layer 3 with IP addresses.
      TLDR: Normal NVMe devices are "over PCI", which is OSI layer 1. With NVMe over Ethernet, the lowest accessible OSI layer is now layer 2 (Ethernet/MAC).
      FINAL EDIT (lol): I'm referring to all layer 2 protocols as "Ethernet" for simplicity. "Over Fabric" encompasses all layer 2 connections, such as Infiniband.

  • @that_kca
    @that_kca 2 ปีที่แล้ว +1

    So close to getting kinetic v2 on there

  • @im.thatoneguy
    @im.thatoneguy 2 ปีที่แล้ว +2

    I'm curious how these scales based on price. Presumably the dual 25gbe controllers "in" each drive are pretty expensive and drive up the cost. Is that cheaper than relying on 6x dual 200Gbe DPUs to expose those PCIe lanes to the fabric?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +3

      What the goal of this is (in the future) is to have the flash controller speak Ethernet not PCIe. So the incremental cost is very small.

    • @koma-k
      @koma-k 2 ปีที่แล้ว

      @@ServeTheHomeVideo does it have an effect on power consumption though? Ethernet is meant for longer distances than PCIe, thus requiring more power... How does a rack of these fare power-wise compared to more "conventional" alternatives?

    • @eDoc2020
      @eDoc2020 2 ปีที่แล้ว +1

      @@koma-k There are different variants of Ethernet. Presumably the drives use a variant like 25GBASE-KR which only has 1m range.

  • @PrestonBannister
    @PrestonBannister 2 ปีที่แล้ว

    Think these might reach the more general market in 2-3 years?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  ปีที่แล้ว

      Hi Preston! Thank you for joining! I think for the general market, it is going to take some time. A lot of that is just based on how these drives are being marketed. If we saw an industrywide push, it would be much faster. I also think that as DPUs become more common, something like the EM6 starts to make a lot more sense since the infrastructure provider can then just pull storage targets over the network and then provision/ do erasure coding directly on the DPU and present it to client VMs or even the bare metal server

  • @bones549
    @bones549 2 ปีที่แล้ว

    My question is cost per gig?

  • @dmytrokyrychuk7049
    @dmytrokyrychuk7049 2 ปีที่แล้ว

    Uhm, can I add a consumer SSD to a normal home Ethernet switch?

  • @gcs8
    @gcs8 2 ปีที่แล้ว +4

    lol, now do ZFS on a DPU.
    More real note, I think this is cool for things that need access to a physical drive but may not have the chassis for it, but I don't think it's going to replace a SAN for things like a clustered filesystem (VMFS/vVOLs).

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +4

      Video is already recorded with the DPU :-)

    • @TotesCray
      @TotesCray 2 ปีที่แล้ว

      @@ServeTheHomeVideo eagerly awaiting the upload!

  • @strandvaskeren
    @strandvaskeren 2 ปีที่แล้ว +1

    How is this different in concept from a bunch of Odroid HC1's? Each drive gets it's own tiny debian server and feeds a network port, sounds pretty much like this thing only this thing is more modern and faster. So rather than managing 4 servers with 24 drives each, you now manage 96 servers with 1 drive each.What am I missing?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว

      Sure, but it is like a bicycle to a Boeing 787 and saying both are modes of transportation but one is more modern and faster.

    • @strandvaskeren
      @strandvaskeren 2 ปีที่แล้ว

      @@ServeTheHomeVideo All I'm saying is, that sticking a tiny server to each drive has been done before, so the main new thing here is sliding 24 of those single drive servers into an 2U enclosure with a fancy switch interface, rather than powering them and network connecting them individually. The really exciting bit is how to manage those 24 servers so you don't need to micro manage each one individually.

  • @ProTechShow
    @ProTechShow 2 ปีที่แล้ว +1

    This is cool. Scalability is the obvious benefit, but I bet people come up with weird and wonderful ways to use it once it goes more mainstream.

  • @johng.1703
    @johng.1703 2 ปีที่แล้ว

    ah so it's not running on x86 it's running on x64.... but there is still a lot of translation going on, granted where the translation is happening has moved, but it is still happening.
    the ONLY real difference is the communication between the drive and the controller, rather than it being serial it is instead using ethernet.
    NVMe over fabric SSD, so it is still a serial device, that would be the NVMe part, then there is a controller sat out in front doing the conversion....
    this looks like some sort of iSCSI connection. granted they changed the command name to NVME .
    the network diagram @12:15 is also not correct, or is there a large part of the network missing? or rather than the big box "switch" have these been connected individually to the network with a device we haven't seen?

  • @kenzieduckmoo
    @kenzieduckmoo 2 ปีที่แล้ว

    i was wondering when you would get a video on these since i saw them on Linus's petabyte of flash video.

    • @slithery9291
      @slithery9291 2 ปีที่แล้ว

      This isn't what Linus used at all. Their project is standard NVME drives directly connected to your everyday x86 server.
      This is way beyond that type of setup...

    • @_TbT_
      @_TbT_ 2 ปีที่แล้ว

      Linus also uses Kioxia drives. That’s where the similarities end.

  • @zachradabaugh4925
    @zachradabaugh4925 2 ปีที่แล้ว +1

    3:15 Wait, are you using film for product photos of bleeding-edge tech? If so, I'm 100% down for it!

    • @blancfilms
      @blancfilms 2 ปีที่แล้ว +1

      That doesn't look like analog noise to me

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +3

      The lab was SUPER dark. Then add to the fact that there is a metal box (rack) around these systems. That is a Canon R5 (R5 C was not out yet) at ISO 12800 just to get something somewhat viewable.

    • @zachradabaugh4925
      @zachradabaugh4925 2 ปีที่แล้ว +1

      @@ServeTheHomeVideo fair enough! Data centers aren’t really known for perfect lighting. Honestly cool to see that the R5 has such usable photos at iso 12800

  • @SudharshanA97
    @SudharshanA97 2 ปีที่แล้ว +1

    Wow!
    This is what they can use like for Supercomputers right?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +1

      The market right now for these is more in scale-out data centers. Think folks running cloud-like models.

  • @bondreacristian1194
    @bondreacristian1194 ปีที่แล้ว

    How much does it cost ingrasys es2000?

  • @linuxgeex
    @linuxgeex 2 ปีที่แล้ว +1

    Wow Patrick must have eaten an entire box of Frosted Flakes before he did this video [ they're Grrrrrrrreat! ]

  • @shodanxx
    @shodanxx 2 ปีที่แล้ว +1

    Like, that's iscsi, a thing I learned about this week after having forgotten about it for 20 years. The reason I found it was I need to build a 10 node cluster out of computers from the recycling. Honestly I'm shocked we haven't had IP capable nvme drives from the start. This is obviously because SAN sellers have been blocking it from existing. My whole week has been, how do I duct tape a 40$ 100g PCIe card to a 200$ 2tb nvme drive, preferably without using any fans.

    • @axiom1650
      @axiom1650 2 ปีที่แล้ว

      40$ 100g pcie card?!

    • @_TbT_
      @_TbT_ 2 ปีที่แล้ว

      @@axiom1650 some used Mellanoxes can be had quite cheap on eBay. 10Gig for 40$ I have seen myself. 100Gig is a stretch.

  • @bw_merlin
    @bw_merlin 2 ปีที่แล้ว

    This sounds like true network attached storage.

  • @Trains-With-Shane
    @Trains-With-Shane 2 ปีที่แล้ว +1

    So it's creating a SAN while cutting out a lot of the middleware. That's pretty cool.

  • @russellzauner
    @russellzauner ปีที่แล้ว +1

    *chuckles in WiGig distributed storage*

  • @thx1200
    @thx1200 2 ปีที่แล้ว +6

    Jesus we're going to need to firewall our disks now. 😀

  • @movax20h
    @movax20h 2 ปีที่แล้ว +1

    Please let us run Ceph OSD (with cleavis/Tang for encryption) deamon directly on this board. This would be some cool. I dreamed of this for years, but finally we are closer. Octa core A75 with 8GB and 4GB of NAND for OS, would be absolutely all I need.
    Accessing 25Gbps drives from one (or few servers) using nvme directly will not scale (how much you can put in one server, 800Gbps maybe). With ceph, each client will connect independently and you can easily saturate few Tbps with enough clients.

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว

      So, stay tuned, hopefully late this week we will have the DPU version of this. Not doing Ceph, but you will recognize some of the other concepts.

  • @mamdouh-Tawadros
    @mamdouh-Tawadros 2 ปีที่แล้ว +2

    I wonder is that equivalent to a router being connected to a mass storage through USB3 ?

    • @marcogenovesi8570
      @marcogenovesi8570 2 ปีที่แล้ว

      in the sense that it is storage available over the network, yes.

    • @_TbT_
      @_TbT_ 2 ปีที่แล้ว

      Not really. It’s like plugging the hard drive directly into a switch/router.

  • @woodshop2300
    @woodshop2300 2 ปีที่แล้ว +1

    Some custom ASIC in the Sonic switches could let the switches do RAID on there own.. RAID controller on steroids, LOL.
    Or maybe just put a Ryzen APU in there as the x86 control plain :) The integrated Vega should be able to do DPU i'd think.

  • @virtualben89
    @virtualben89 2 ปีที่แล้ว +1

    Ethernet to rule them all.

  • @SLYKER001
    @SLYKER001 2 ปีที่แล้ว +1

    Soooo, instead of having one big server for bunch of disks, now we have bunch of disks with integrated server each; main benefit wich i see is more reliability
    Hmmm, can single or pair drive be mounted as storage in windows? :D

  • @chwaee
    @chwaee 2 ปีที่แล้ว

    Security implications? First generation is bound to have zero days... Will make for some interesting news in a couple years :D

  • @OVERKILL_PINBALL
    @OVERKILL_PINBALL 2 ปีที่แล้ว +1

    Who manages these drives, the storage team or the networking team? : P

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +1

      And... Fight! :-)

    • @mrmotofy
      @mrmotofy 2 ปีที่แล้ว

      Who? The guy who has lunch with the boss :) like always

  • @platin2148
    @platin2148 2 ปีที่แล้ว +1

    This is true craziness. Is it 50gb/s or 25gb/s i only saw something like dual 25g?

    • @ServeTheHomeVideo
      @ServeTheHomeVideo  2 ปีที่แล้ว +1

      Dual 25GbE in this demo, but you are correct, new generations of NVMeoF drives can go to faster speeds super easily.

  • @thenwhoami
    @thenwhoami ปีที่แล้ว +1

    *mind blown*

  • @BloodyIron
    @BloodyIron 2 ปีที่แล้ว

    Considering the redundancy is external to the system this sounds like it's very easy to accidentally remove the wrong drive for hot-swap replacement and incur very real data loss. :/ I'm thinking this primarily when Patrick started talking about namespace slicing. I don't yet see what the advantage of this topology is.

  • @ymeasureacher7390
    @ymeasureacher7390 7 หลายเดือนก่อน

    network attached storage as it must be

  • @cdoublejj
    @cdoublejj 2 ปีที่แล้ว

    kes me think it's sort of like a hardware iSCSI

  • @klafbang
    @klafbang 2 ปีที่แล้ว

    Neat!