Thunderbolt Networking is FAST & CHEAP!

แชร์
ฝัง
  • เผยแพร่เมื่อ 1 ก.ค. 2024
  • In this video I show how I created a thunderbolt 4 network ring to cluster my Proxmox nodes. The MS-01 is a perfect option for this approach. Is it the ultimate "budget" cluster?
    Thanks to Scyto: gist.github.com/scyto/76e9483...
    Minis Forum MS-01: amzn.to/3V9DkAa
    Cable Matters Thunderbolt: amzn.to/4bOOZtU
    Samsung 980 Pro: amzn.to/4dSaxaS
    Corsair Vengeance: amzn.to/44OehWF
    GitHub:
    github.com/JamesTurland/JimsG...
    Recommended Hardware: github.com/JamesTurland/JimsG...
    Discord: / discord
    Twitter: / jimsgarage_
    Reddit: / jims-garage
    GitHub: github.com/JamesTurland/JimsG...
    00:00 - Introduction to Video
    00:30 - Unifi Aggregation Switch Integration
    03:22 - Thunderbolt 4 Ring Network
    05:04 - Thunderbolt 4 Demo (Speed Test)
    08:32 - Thunderbolt 4 Configuration
    23:32 - Checking
    25:12 - Using the Thunderbolt 4 Ring for Proxmox
    27:20 - Create Proxmox Cluster
  • วิทยาศาสตร์และเทคโนโลยี

ความคิดเห็น • 102

  • @rafrackowiak
    @rafrackowiak หลายเดือนก่อน +9

    This work of yours is spectacular!

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน +1

      Thanks, glad it was useful! Anything you want to know give me a shout and I'll try to answer.

  •  หลายเดือนก่อน +2

    Thank you for breaking it down. Answered all the questions I didn't even know I had!

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Glad it was helpful!

  • @simo47768
    @simo47768 หลายเดือนก่อน +2

    Wow. Amazing video again.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Glad you enjoyed it!

  • @ob2522
    @ob2522 12 วันที่ผ่านมา +1

    tmux is great for running stuff in the background, I use it all the time for switching between nodes in my cluster.
    Thanks for a fascinating vid!

    • @Jims-Garage
      @Jims-Garage  11 วันที่ผ่านมา

      Thanks, tmux is on the list 👍

  • @TradersTradingEdge
    @TradersTradingEdge หลายเดือนก่อน +2

    Holy smoke :) That's awesome. Thanks for that.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Thanks 👍

  • @francoismartin5578
    @francoismartin5578 หลายเดือนก่อน +1

    Hello Really Thank’s for this great video ❤

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Thanks for visiting

  • @squidiebah
    @squidiebah หลายเดือนก่อน +2

    Thanks for the video, I've been wanting to use USB4 between two Linux cluster nodes for a while and was curious about stability. Thunderbolt-net module supports networking thankfully.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Great to hear! Let me know how that goes. It's a great way to have fast, "cheap" networking.

  • @scotteecarr
    @scotteecarr หลายเดือนก่อน

    This is an incredible bit of kit. I wish these MS01s existed when i built my cluster. I'm pretty sure you'll need more then 3 nodes if you plan to run ceph as 3 is the bare minimum count. You don't want split brain, and 3 nodes won't allow for you to perform maintenance.

    • @scytob
      @scytob 29 วันที่ผ่านมา +1

      works fine with one node down - i found that out the hard way, lol, stayed working, performance was great and gave me time to get node 3 back up and running

  • @Mad-Jam
    @Mad-Jam หลายเดือนก่อน +8

    Search on YT for a Video called "USB4 & Thunderbolt's TRUE Speed: Only 22Gbps!"
    He nailed it.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Thanks, I will take a look at it.

    • @scytob
      @scytob 29 วันที่ผ่านมา +2

      It's 26Gbps on intel NUCs, it varies on systems due to the DMA controller. The MS-101 only appears to be 20Gbps USB4 not 40Gbps TB4 which is why you see difference. It doesn't really matter as that 20% doesn't make much real world difference.

    • @Jims-Garage
      @Jims-Garage  29 วันที่ผ่านมา

      @@scytob assuming that's because the 4 lanes are split 2x2 with ports? Traditionally you'd have a full 4 lanes?

    • @scytob
      @scytob 29 วันที่ผ่านมา

      @@Jims-Garage i believe its a combination of how many retimers are used / Gen 3x1 vs Gen3x2 pcie. for example one could have a Genx2 but not implement enough retimers along with how much bandwidth is allocated to interdomain, vs tunneling and if it is fixed or dynamic on a specific implementation. Also shh USB3 is not USB (aka serial), it is actually a routed protocol that supports DP, USB3 and PCIE tunneling.... this is why your statement that thunderbolt networking was deprecated isn't correct, it just wasn't being well tested because they were focused on getting the core USB4 interdomain routing working first, i found some critical interdomain bugs that needed to be fixed.... and they fixed the IPv6 bugs as part of that, anyhoo, this is all based on my (mis?)reading of the USB4 specification documents. Bandwidth allocation is documented on page 60 of the connection manager guide for USB4 - its a heavy read, lol, also look at the interdomain service guide to see why thunderbolt networking was absolutely foundational too.

    • @scytob
      @scytob 29 วันที่ผ่านมา

      @@Jims-Garage or to say it another way the bandwidth allocation between pcie / usb3 / dp tunneling etc is done dynamically across all ports - the lanes are not split per port - its the total number of lanes that matter, nothing else (unless someone doesn't something silly like one thunderbolt controller per port....)

  • @LaurenceHartje
    @LaurenceHartje หลายเดือนก่อน +7

    Thanks! I've been considering this exact same setup since the MS-01 was announced, but wanted to see some real life experiences before financially committing to this setup (as you mentioned thunderbolt Ethernet was in a strange state at that time). Looking forward to seeing the Ceph results and performance over the thunderbolt Ethernet.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน +2

      Thanks, glad it was useful. I'll be covering that part soon.

  • @TheMongolPrime
    @TheMongolPrime หลายเดือนก่อน +1

    Love the node names. I knew I liked you.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Haha, thanks! For The Emperor and Sanguinius!

  • @ewenchan1239
    @ewenchan1239 หลายเดือนก่อน +5

    Thunderbolt networking is cheap if you have three MS-01s, which are at least $439 USD barebones or $629 USD for the Core i5 12600H version with 32 GB of RAM and a 1 TB SSD. But if you want the top of the line, pre-configured model from Minisforum, each node will set you back $829 USD, which means that three nodes + Thunderbolt cables, will run you closer to $2500 USD in total.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      True for the MS-01 but many consumer devices have a couple of thunderbolt ports. It also means you don't have to buy additional adapters.

    • @ewenchan1239
      @ewenchan1239 หลายเดือนก่อน +1

      @@Jims-Garage
      Varies.
      Depends on what it is, and its respective age.
      My 7th gen NUC - no.
      8th gen NUC has one Thunderbolt 3 port on it.
      None of the rest of my systems, not even my Asus Z690 Prime P motherboard (for 12th gen Intel) has Thunderbolt on it.
      So it REALLY depends.

    • @scytob
      @scytob 29 วันที่ผ่านมา +1

      For folks who already have decided to use NUC with TB form factors this is basically 'free' compared to buying 3 TB 10gbe adapters....

    • @scytob
      @scytob 29 วันที่ผ่านมา

      @@ewenchan1239 and none of that invalidates its basically for free for folks who are making their purchase decisions now...

    • @ewenchan1239
      @ewenchan1239 29 วันที่ผ่านมา

      @@scytob
      If you have a NUC that only has a single TB port, then you won't be able to do this (per Wendell) in terms of being able to set up a near token ring type of network using nCr(3,2) combinations of ports on systems. The best that you'd be able to do with a single port on each NUC is a single pair of point-to-point connection, which means that the 3rd system will NOT be connected to said Thunderbolt-based high(er) speed network.

  • @ryanmalone2681
    @ryanmalone2681 หลายเดือนก่อน +1

    Like the sweater, but I’m really missing the pastel cardigans. Video isn’t bad either 😉. Appreciate you.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Haha, thanks. I will try to sort out my attire ASAP

  • @BZFFirst
    @BZFFirst หลายเดือนก่อน +3

    Hi. Would you do a video on how to configure the SR-IOV for the Miniforums. And how to use it in VM's?

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน +2

      Yes, I'll likely cover in the near future

    • @scytob
      @scytob 29 วันที่ผ่านมา +1

      @@Jims-Garage even i want this, i cant get that to be reliable.....

  • @JeramiFrost
    @JeramiFrost 23 วันที่ผ่านมา +1

    This may be inconsequential, but at roughly 25:50 you are talking about using the IPV4 link you created, but the actual page that shows your migration settings shows the IPV6 address selected. Does this matter?
    Edit: After going back to the beginning of the video and doing the ping test i answered my own question. I could not ping with the IPV4 selection. I switched it to the IPV6 like yours showed and i was able to successfully ping.
    Thanks for an awesome, and super helpful video.

    • @Jims-Garage
      @Jims-Garage  23 วันที่ผ่านมา

      You're welcome, thanks 👍

  • @uvalleza
    @uvalleza หลายเดือนก่อน +2

    Jim, in your running config i am not seeing the "ip router openfabric 1" only the "ipv6 router openfabric 1" one, which explains why you cant use ipv4 (this is in the interface en05/en06 section when you showed "show running-config"). But fyi, i went through this like 3 times because for some reason it sometimes doesn't take the setting, so have to redo it until it picks it up. I did this for my UM790 Pros and have MS01's on the way! So cant wait to set it like this :) but just discovered your channel and loving the content man. Keep it up!

    • @Khaldrogo5
      @Khaldrogo5 หลายเดือนก่อน +1

      Happened to me too. first time typing "ip router openfabric 1" for the FRR config gives an error "Couldn't bring up interface, please check log.". If you just re-enter "ip router openfabric 1" the error goes away and get added into the config.

    • @scytob
      @scytob 29 วันที่ผ่านมา +1

      nice find, doing cat /etc/frr/frr.conf will show what frr has comitted (note NEVER edit this file if it is wrong, use the frr console)

    • @scytob
      @scytob 29 วันที่ผ่านมา +1

      i made some edits to the gist that might avoid this, its unclear why this broke on some systems - i wonder if it is a bug - the key is use vtysh -c "show rrunning-config" will show if the confi took or not

  • @mihaitamas
    @mihaitamas 27 วันที่ผ่านมา +1

    For max performance you will need to set affinity on the performance cores for iperf3. That will get you to 26Gbits/s across the board.

    • @Jims-Garage
      @Jims-Garage  27 วันที่ผ่านมา

      I believe for the MS-01 it's due to the PCIe lanes being shared between ports and DMA, not CPU affinity.

    • @mihaitamas
      @mihaitamas 21 วันที่ผ่านมา

      @@Jims-Garage I highly doubt it as before setting the affinity could not get more than 18-21Gbps and with a LOT of retries on one of the nodes, like >8K retries on a 60 sec iperf test. What I am trying to say, is that you might be lucky, but better to chose and have consistent performance, rather than try to guess what's wrong. ;)

  • @cberthe067
    @cberthe067 หลายเดือนก่อน

    When setting Ceph, use erasure-coding and not replication to maximize disk space ... may possible that you need to create it manually with shell command as if i remember Proxmox only support replication ...

    • @scytob
      @scytob 29 วันที่ผ่านมา +1

      erasure-encoding has its downsides in terms of latency and reliability in a 3 node cluster

  • @cryptodendrum
    @cryptodendrum 28 วันที่ผ่านมา +1

    I wonder how Apple Thunderbolt4 cables would perform? If that would cure your retransmission error count? Or if the observed netlink speed would be any faster?

    • @Jims-Garage
      @Jims-Garage  28 วันที่ผ่านมา +1

      I can't imagine it would make a difference, it's a reputable cable that meets the specifications. The speed is likely limited by 2x PCIe lanes and DMA.

    • @cryptodendrum
      @cryptodendrum 21 วันที่ผ่านมา +1

      @@Jims-Garage Thanks for the advice. My Cable-Matters branded Thunderbolt cables arrived, along with my RAM and NVMe's. Just waiting for the computers for our new cluster to arrive now. Looking forward to getting started with it.
      FYI - it'll replace an old Protectli i5 unit from 2018 & two old MacMini's; all which run XCP-NG currently and I plan to try to run XCP-NG on my new MS-01 cluster.

  • @totoro1596
    @totoro1596 หลายเดือนก่อน +1

    cool stuff! why lvm instead of zfs?

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน +1

      For speed, this is the only 4x4 slot. I also backup daily and replicate so infrastructure isn't a problem.

  • @jacobburgin826
    @jacobburgin826 หลายเดือนก่อน +1

    Most of us are struggling with IPV4 Especially after reboot. I ended up taking it out and just using IPV6
    Restarting the fee service usually brings ipv4 back up but rebooting the system kills it again

    • @scytob
      @scytob 29 วันที่ผ่านมา

      IPv4 takes longer to converge, i never figured out why as IPv6 is better suited to this task IMO

  • @MarkConstable
    @MarkConstable หลายเดือนก่อน +1

    I've been using Ceph on my little mongrel cluster since early last year and it's been fine on a 2.5GbE fabric (started off with 1GbE).
    BTW, you mentioned passing through a GPU to containers in a k3 cluster so I suspect you are using SR-IOV and not full iGPU passthrough.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน +2

      For now it's full passthrough with 3 agent nodes. I will be looking into SR-IOV but with kernel 6.8 the custom DKMS doesn't work.

    • @MarkConstable
      @MarkConstable หลายเดือนก่อน

      @@Jims-Garage Cool. It would be really exellent if you could detail your exact settings somewhere because I'm not aware of anyone else that has cracked this particular procedure.

    • @cberthe067
      @cberthe067 หลายเดือนก่อน

      Does SR-IOV is able to share a GPU among VM/LXC container ?

  • @crc-error-7968
    @crc-error-7968 หลายเดือนก่อน +1

    Ciao Jim, which is the power consumption of the 3 ms01? idle, with few wms, etc.?

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน +1

      This cluster is currently pulling 160W with what you see. I suspect you could get lower with core pinning but I haven't done that yet. I simply wanted to migrate over ASAP as that would give me power savings and speed improvements regardless.

    • @crc-error-7968
      @crc-error-7968 หลายเดือนก่อน +1

      @@Jims-Garage So, about 54w/unit.
      Honestly, I thought they would consume much less when idle..
      I'm looking for a solution similar to yours to downgrade (in terms of consumption) and upgrade (in terms of CPU) my homelab, but it's easy to make a mistake since there is an ocean of possibilities between mini PCs and components to choose from.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Idle consumption is almost pointless because if it's idle turn it off. The figures I gave were running around 12VMs across all nodes.

    • @crc-error-7968
      @crc-error-7968 หลายเดือนก่อน

      @@Jims-Garage sorry, for idle I mean do nothing heavy, something like mqtt receiving data, home assistant does its stuffs, uptime kuma which monitoring services, traefik waiting new requests to redirect, jellyfin waiting me choosing something to see, etc.. etc..

  • @fasti8993
    @fasti8993 หลายเดือนก่อน +1

    Can you provide information on cpu utilization if the Thunderbolts are doing maximum traffic? If I get it right, you are kind of sacrificing a little bit of your cpu as an ethernet controler… This might play a roll for making a decision on which version of the machine to by cause dual a 25g nic goes for 50 bucks on ebay…

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      I'll take a look at it and post online

  • @lmaguire
    @lmaguire หลายเดือนก่อน +1

    Pretty sure your V4 issue is the fact that you’re trying to set a/32 on those if you go back to old school sub netting and remember how Anding works. I believe it’ll pop up that each of those is going to think it’s on its own isolated network.

    • @scotteecarr
      @scotteecarr หลายเดือนก่อน

      I agree. Change the /32 netmask to /24 or something to include more addresses.

    • @scytob
      @scytob 29 วันที่ผ่านมา

      the advantage of a /32 is it fixes the subnet as a single IP - its a very effective strategy to hard code an IP when one can't.

  • @shephusted2714
    @shephusted2714 หลายเดือนก่อน +3

    do some benchmarks

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      What would you like to see?

  • @djvincon
    @djvincon หลายเดือนก่อน +1

    FOR THE EMPEROR!

  • @ClacKeyTech
    @ClacKeyTech หลายเดือนก่อน +1

    i recently built 2 pro aggregation switches in my schools network

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      That's awesome. Must be great for that kind of environment.

    • @ClacKeyTech
      @ClacKeyTech หลายเดือนก่อน +1

      @@Jims-Garage absolutely

  • @michaeljolley6773
    @michaeljolley6773 หลายเดือนก่อน +1

    This may be a dumb question but can I do this using USB 3.1 Gen 2? I don't have USB C ports but at least the USB 3.1 Gen 2 has faster speeds than my single 1gb lan port on each node.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      I don't believe so, you can do it with thunderbolt 3 though I think.

    • @scytob
      @scytob 29 วันที่ผ่านมา

      USB3 doesn't support XDOMAIN specification so cannot use XDOMAIN networking (aka thunderbolt). USB4 is actually thunderbolt 3+some extensions (TB4 has more mandatory items).

    • @scytob
      @scytob 29 วันที่ผ่านมา

      @@Jims-Garage
      @scytob
      0 seconds ago
      USB3 doesn't support XDOMAIN specification so cannot use XDOMAIN networking (aka thunderbolt). USB4 is actually thunderbolt 3+some extensions (TB4 has more mandatory items).

  • @akurenda1985
    @akurenda1985 หลายเดือนก่อน +1

    That's some great network speed, but I can't help but see you're just using local LVM storage. So no zfs replication or ceph? Seems like a waste to just use the 25 gigabit network just for migrating VM's.

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      Thanks. As I mention in the video I'll be doing Ceph in the near future (want the infrastructure in place first).

  • @NetBandit70
    @NetBandit70 หลายเดือนก่อน +4

    My-craw-tic
    Wat!?!

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      How is it supposed to be pronounced?

  • @BenjaminBenStein
    @BenjaminBenStein หลายเดือนก่อน +1

    🎉

  • @casperghst42
    @casperghst42 หลายเดือนก่อน

    As for the drop outs, Wayne Fox made some testes using Caldigit and Apple (Pro) cables, and found that non-active cables might not always be perfect (th-cam.com/video/BcX8yeqf_5w/w-d-xo.htmlsi=siTWqYroIBC6fzL4) - there is a price difference though.

    • @scytob
      @scytob 29 วันที่ผ่านมา

      I use short OWC TB4 cables, work perfectly.

  • @Andy-fd5fg
    @Andy-fd5fg หลายเดือนก่อน +1

    hummm mesh networking... now all i need is enough computers to try this on

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      I was really surprised with how good it is!

  • @ws_stelzi79
    @ws_stelzi79 หลายเดือนก่อน

    Man how you pronounce Mikrotik ... 🤔

    • @Jims-Garage
      @Jims-Garage  หลายเดือนก่อน

      How's it supposed to be pronounced? 😂

    • @headlibrarian1996
      @headlibrarian1996 26 วันที่ผ่านมา

      Micro-tick was my assumption.

    • @cryptodendrum
      @cryptodendrum 2 วันที่ผ่านมา

      @@headlibrarian1996 I've always heard it pronounced Micro-tik too, but from now on, it shall officially always be called My-craw-tic. I'm going to use that everywhere now. :D

  • @nirv
    @nirv หลายเดือนก่อน

    Why is there a dollar sign in the center of a heart and "thanks?" Why are there affiliate links in the description?
    No dude. You are violating the spirit of the internet. Get these dollar signs out of my face. This is the INTERNET. What are you doing dude?