Thunderbolt Networking is FAST & CHEAP!

Jim's Garage

มุมมอง 31 270

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 20 พ.ย. 2024

ความคิดเห็น • 136

$@rafrackowiak$
@rafrackowiak 5 หลายเดือนก่อน ⁺¹⁵
This work of yours is spectacular!
@Jims-Garage 5 หลายเดือนก่อน ⁺¹
Thanks, glad it was useful! Anything you want to know give me a shout and I'll try to answer.
@ob2522 5 หลายเดือนก่อน ⁺¹
tmux is great for running stuff in the background, I use it all the time for switching between nodes in my cluster.
Thanks for a fascinating vid!
@Jims-Garage 5 หลายเดือนก่อน
Thanks, tmux is on the list 👍
@scotteecarr 5 หลายเดือนก่อน ⁺¹
This is an incredible bit of kit. I wish these MS01s existed when i built my cluster. I'm pretty sure you'll need more then 3 nodes if you plan to run ceph as 3 is the bare minimum count. You don't want split brain, and 3 nodes won't allow for you to perform maintenance.
@scytob 5 หลายเดือนก่อน ⁺²
works fine with one node down - i found that out the hard way, lol, stayed working, performance was great and gave me time to get node 3 back up and running
@LtdJorge 4 หลายเดือนก่อน
There is no split brain with 3 nodes. As long as you have it set up as size = 3, min_size = 2, it will run with 2 nodes up in degraded mode. If there was a network split where you’d get 2/1 nodes separated, the group with the 2 nodes can continue running while the group with 1 cannot.
5 หลายเดือนก่อน ⁺²
Thank you for breaking it down. Answered all the questions I didn't even know I had!
@Jims-Garage 5 หลายเดือนก่อน
Glad it was helpful!
@LaurenceHartje 5 หลายเดือนก่อน ⁺⁹
Thanks! I've been considering this exact same setup since the MS-01 was announced, but wanted to see some real life experiences before financially committing to this setup (as you mentioned thunderbolt Ethernet was in a strange state at that time). Looking forward to seeing the Ceph results and performance over the thunderbolt Ethernet.
@Jims-Garage 5 หลายเดือนก่อน ⁺²
Thanks, glad it was useful. I'll be covering that part soon.
@squidiebah 5 หลายเดือนก่อน ⁺³
Thanks for the video, I've been wanting to use USB4 between two Linux cluster nodes for a while and was curious about stability. Thunderbolt-net module supports networking thankfully.
@Jims-Garage 5 หลายเดือนก่อน ⁺¹
Great to hear! Let me know how that goes. It's a great way to have fast, "cheap" networking.
@Mad-Jam 5 หลายเดือนก่อน ⁺¹⁰
Search on YT for a Video called "USB4 & Thunderbolt's TRUE Speed: Only 22Gbps!"
He nailed it.
@Jims-Garage 5 หลายเดือนก่อน
Thanks, I will take a look at it.
@scytob 5 หลายเดือนก่อน ⁺²
It's 26Gbps on intel NUCs, it varies on systems due to the DMA controller. The MS-101 only appears to be 20Gbps USB4 not 40Gbps TB4 which is why you see difference. It doesn't really matter as that 20% doesn't make much real world difference.
@Jims-Garage 5 หลายเดือนก่อน
@@scytob assuming that's because the 4 lanes are split 2x2 with ports? Traditionally you'd have a full 4 lanes?
@scytob 5 หลายเดือนก่อน
@@Jims-Garage i believe its a combination of how many retimers are used / Gen 3x1 vs Gen3x2 pcie. for example one could have a Genx2 but not implement enough retimers along with how much bandwidth is allocated to interdomain, vs tunneling and if it is fixed or dynamic on a specific implementation. Also shh USB3 is not USB (aka serial), it is actually a routed protocol that supports DP, USB3 and PCIE tunneling.... this is why your statement that thunderbolt networking was deprecated isn't correct, it just wasn't being well tested because they were focused on getting the core USB4 interdomain routing working first, i found some critical interdomain bugs that needed to be fixed.... and they fixed the IPv6 bugs as part of that, anyhoo, this is all based on my (mis?)reading of the USB4 specification documents. Bandwidth allocation is documented on page 60 of the connection manager guide for USB4 - its a heavy read, lol, also look at the interdomain service guide to see why thunderbolt networking was absolutely foundational too.
@scytob 5 หลายเดือนก่อน
@@Jims-Garage or to say it another way the bandwidth allocation between pcie / usb3 / dp tunneling etc is done dynamically across all ports - the lanes are not split per port - its the total number of lanes that matter, nothing else (unless someone doesn't something silly like one thunderbolt controller per port....)
@ryanmalone2681 5 หลายเดือนก่อน ⁺¹
Like the sweater, but I’m really missing the pastel cardigans. Video isn’t bad either 😉. Appreciate you.
@Jims-Garage 5 หลายเดือนก่อน
Haha, thanks. I will try to sort out my attire ASAP
@BZFFirst 5 หลายเดือนก่อน ⁺⁴
Hi. Would you do a video on how to configure the SR-IOV for the Miniforums. And how to use it in VM's?
@Jims-Garage 5 หลายเดือนก่อน ⁺³
Yes, I'll likely cover in the near future
@scytob 5 หลายเดือนก่อน ⁺¹
@@Jims-Garage even i want this, i cant get that to be reliable.....
@jacobburgin826 5 หลายเดือนก่อน ⁺³
Most of us are struggling with IPV4 Especially after reboot. I ended up taking it out and just using IPV6
Restarting the fee service usually brings ipv4 back up but rebooting the system kills it again
@scytob 5 หลายเดือนก่อน
IPv4 takes longer to converge, i never figured out why as IPv6 is better suited to this task IMO
@uvalleza 5 หลายเดือนก่อน ⁺³
Jim, in your running config i am not seeing the "ip router openfabric 1" only the "ipv6 router openfabric 1" one, which explains why you cant use ipv4 (this is in the interface en05/en06 section when you showed "show running-config"). But fyi, i went through this like 3 times because for some reason it sometimes doesn't take the setting, so have to redo it until it picks it up. I did this for my UM790 Pros and have MS01's on the way! So cant wait to set it like this :) but just discovered your channel and loving the content man. Keep it up!
@Khaldrogo5 5 หลายเดือนก่อน ⁺¹
Happened to me too. first time typing "ip router openfabric 1" for the FRR config gives an error "Couldn't bring up interface, please check log.". If you just re-enter "ip router openfabric 1" the error goes away and get added into the config.
@scytob 5 หลายเดือนก่อน ⁺²
nice find, doing cat /etc/frr/frr.conf will show what frr has comitted (note NEVER edit this file if it is wrong, use the frr console)
@scytob 5 หลายเดือนก่อน ⁺¹
i made some edits to the gist that might avoid this, its unclear why this broke on some systems - i wonder if it is a bug - the key is use vtysh -c "show rrunning-config" will show if the confi took or not
@JeramiFrost 5 หลายเดือนก่อน ⁺²
This may be inconsequential, but at roughly 25:50 you are talking about using the IPV4 link you created, but the actual page that shows your migration settings shows the IPV6 address selected. Does this matter?
Edit: After going back to the beginning of the video and doing the ping test i answered my own question. I could not ping with the IPV4 selection. I switched it to the IPV6 like yours showed and i was able to successfully ping.
Thanks for an awesome, and super helpful video.
@Jims-Garage 5 หลายเดือนก่อน
You're welcome, thanks 👍
@ewenchan1239 5 หลายเดือนก่อน ⁺⁷
Thunderbolt networking is cheap if you have three MS-01s, which are at least $439 USD barebones or $629 USD for the Core i5 12600H version with 32 GB of RAM and a 1 TB SSD. But if you want the top of the line, pre-configured model from Minisforum, each node will set you back $829 USD, which means that three nodes + Thunderbolt cables, will run you closer to $2500 USD in total.
@Jims-Garage 5 หลายเดือนก่อน ⁺³
True for the MS-01 but many consumer devices have a couple of thunderbolt ports. It also means you don't have to buy additional adapters.
@ewenchan1239 5 หลายเดือนก่อน ⁺¹
@@Jims-Garage
Varies.
Depends on what it is, and its respective age.
My 7th gen NUC - no.
8th gen NUC has one Thunderbolt 3 port on it.
None of the rest of my systems, not even my Asus Z690 Prime P motherboard (for 12th gen Intel) has Thunderbolt on it.
So it REALLY depends.
@scytob 5 หลายเดือนก่อน ⁺²
For folks who already have decided to use NUC with TB form factors this is basically 'free' compared to buying 3 TB 10gbe adapters....
@scytob 5 หลายเดือนก่อน
@@ewenchan1239 and none of that invalidates its basically for free for folks who are making their purchase decisions now...
@ewenchan1239 5 หลายเดือนก่อน
@@scytob
If you have a NUC that only has a single TB port, then you won't be able to do this (per Wendell) in terms of being able to set up a near token ring type of network using nCr(3,2) combinations of ports on systems. The best that you'd be able to do with a single port on each NUC is a single pair of point-to-point connection, which means that the 3rd system will NOT be connected to said Thunderbolt-based high(er) speed network.
@wesleyelder 5 วันที่ผ่านมา ⁺¹
hey Jim, have you tried to bridge the Thunderbolt network? While its great for CEPH to sync and move stuff quickly between, i also wanted to use it in VMs/CT. I need to create a bridge i guess, then bind to a interface, but i cant for the life of me get it working
@Jims-Garage 2 วันที่ผ่านมา
That should be possible, the problem being trying to attach it to a switch/router. Point to point should work normally.
@xenos1983 4 หลายเดือนก่อน
Hi Jim
Thank you for your very comprehensive video. I really appreciate your work.
Do you know if this will also work on Thunderbolt 3 on older NUC7i5BNB?
@xenos1983 4 หลายเดือนก่อน
After some testing I've gotten it working. Had trouble to obtain the correct pci path because it was nested in a longer path on my old NUC (12Gbps).
This command gave me the correct result for the path:
udevadm info /sys/class/net/thunderbolt0 | grep ID_PATH
@lmaguire 5 หลายเดือนก่อน ⁺¹
Pretty sure your V4 issue is the fact that you’re trying to set a/32 on those if you go back to old school sub netting and remember how Anding works. I believe it’ll pop up that each of those is going to think it’s on its own isolated network.
@scotteecarr 5 หลายเดือนก่อน
I agree. Change the /32 netmask to /24 or something to include more addresses.
@scytob 5 หลายเดือนก่อน
the advantage of a /32 is it fixes the subnet as a single IP - its a very effective strategy to hard code an IP when one can't.
@TradersTradingEdge 5 หลายเดือนก่อน ⁺²
Holy smoke :) That's awesome. Thanks for that.
@Jims-Garage 5 หลายเดือนก่อน
Thanks 👍
@TheMongolPrime 5 หลายเดือนก่อน ⁺¹
Love the node names. I knew I liked you.
@Jims-Garage 5 หลายเดือนก่อน
Haha, thanks! For The Emperor and Sanguinius!
@MarkConstable 5 หลายเดือนก่อน ⁺¹
I've been using Ceph on my little mongrel cluster since early last year and it's been fine on a 2.5GbE fabric (started off with 1GbE).
BTW, you mentioned passing through a GPU to containers in a k3 cluster so I suspect you are using SR-IOV and not full iGPU passthrough.
@Jims-Garage 5 หลายเดือนก่อน ⁺²
For now it's full passthrough with 3 agent nodes. I will be looking into SR-IOV but with kernel 6.8 the custom DKMS doesn't work.
@MarkConstable 5 หลายเดือนก่อน
@@Jims-Garage Cool. It would be really exellent if you could detail your exact settings somewhere because I'm not aware of anyone else that has cracked this particular procedure.
@cberthe067 5 หลายเดือนก่อน
Does SR-IOV is able to share a GPU among VM/LXC container ?
@simo47768 5 หลายเดือนก่อน ⁺²
Wow. Amazing video again.
@Jims-Garage 5 หลายเดือนก่อน
Glad you enjoyed it!
@mihaitamas 5 หลายเดือนก่อน ⁺¹
For max performance you will need to set affinity on the performance cores for iperf3. That will get you to 26Gbits/s across the board.
@Jims-Garage 5 หลายเดือนก่อน
I believe for the MS-01 it's due to the PCIe lanes being shared between ports and DMA, not CPU affinity.
@mihaitamas 5 หลายเดือนก่อน
@@Jims-Garage I highly doubt it as before setting the affinity could not get more than 18-21Gbps and with a LOT of retries on one of the nodes, like >8K retries on a 60 sec iperf test. What I am trying to say, is that you might be lucky, but better to chose and have consistent performance, rather than try to guess what's wrong. ;)
@bartomiejlesniowski8635 3 หลายเดือนก่อน
@@Jims-Garage , @mihaitamas's right, try use this on each node:
#!/bin/bash
for id in $(grep 'thunderbolt' /proc/interrupts | awk '{print $1}' | cut -d ':' -f1); do
echo 0f > /proc/irq/$id/smp_affinity
done
@chriwas หลายเดือนก่อน
@Jims-Garage I tried the script of @bartomiejlesniowski8635 and it improved from 13.4 Gb/s to 21.9 Gb/s on three Intel NUC 12 Pro. However, the changes do not survive a reboot. I have to find a way to run the script automatically after booting. Any advice what's the right way?
@chriwas หลายเดือนก่อน
I figured out how to run the script consistently at boot. I used a systemd service with running the script at the end. Without delay it was not consistent to assign the thunderbolt interrupts only to the performance cpus.
@ajayganeshie1857 4 หลายเดือนก่อน ⁺¹
Hi there Jim, how are you? Is this also possible with fast usb?
Thank you!
Ajay
@Jims-Garage 4 หลายเดือนก่อน
@@ajayganeshie1857 I believe USB 4 as it's thunderbolt 4 (otherwise I haven't tested as I don't have the means to).
@antonio.luevano 2 หลายเดือนก่อน ⁺¹
How did you get the lo:0 and lo:6 to show up in the GUI? I followed the instuctions and the thunderbolt is working, but is not being seen in the UI. Great tutorial. TIA.
@Jims-Garage 2 หลายเดือนก่อน
@@antonio.luevano I will have to check. Since recording there has been a Proxmox update and changes to the methodology.
@antonio.luevano 2 หลายเดือนก่อน
@@Jims-Garage I appreciate your response. I really enjoy your videos, you are very through with the steps and concepts.
@cryptodendrum 5 หลายเดือนก่อน ⁺¹
I wonder how Apple Thunderbolt4 cables would perform? If that would cure your retransmission error count? Or if the observed netlink speed would be any faster?
@Jims-Garage 5 หลายเดือนก่อน ⁺¹
I can't imagine it would make a difference, it's a reputable cable that meets the specifications. The speed is likely limited by 2x PCIe lanes and DMA.
@cryptodendrum 5 หลายเดือนก่อน ⁺¹
@@Jims-Garage Thanks for the advice. My Cable-Matters branded Thunderbolt cables arrived, along with my RAM and NVMe's. Just waiting for the computers for our new cluster to arrive now. Looking forward to getting started with it.
FYI - it'll replace an old Protectli i5 unit from 2018 & two old MacMini's; all which run XCP-NG currently and I plan to try to run XCP-NG on my new MS-01 cluster.
@JonatanCastro 4 หลายเดือนก่อน ⁺¹
I actually tried with my Apple Display studio cable and it's same speed and retries
@Jims-Garage 4 หลายเดือนก่อน
@@JonatanCastro good to know
@francoismartin5578 5 หลายเดือนก่อน ⁺¹
Hello Really Thank’s for this great video ❤
@Jims-Garage 5 หลายเดือนก่อน
Thanks for visiting
@djmillhaus 4 หลายเดือนก่อน ⁺²
You're using TB4 cables, but I wonder would USB4 40Gbps cables that are "compatible to TB3" do that job as well?
@Jims-Garage 4 หลายเดือนก่อน
AFAIK tb4 and usb4 are the same.
@simonthornroos752 3 หลายเดือนก่อน ⁺²
@@Jims-Garage @djmillhaus I actually tried with usb4 cable and those didn't get recognized by the ms01 on any of my 3 nodes. I have just placed an order on tb4 cables. Fingers crossed those will work!
@Jims-Garage 3 หลายเดือนก่อน ⁺¹
@@simonthornroos752 good to know, thanks
@totoro1596 5 หลายเดือนก่อน ⁺¹
cool stuff! why lvm instead of zfs?
@Jims-Garage 5 หลายเดือนก่อน ⁺¹
For speed, this is the only 4x4 slot. I also backup daily and replicate so infrastructure isn't a problem.
@shephusted2714 5 หลายเดือนก่อน ⁺³
do some benchmarks
@Jims-Garage 5 หลายเดือนก่อน
What would you like to see?
@cberthe067 5 หลายเดือนก่อน
When setting Ceph, use erasure-coding and not replication to maximize disk space ... may possible that you need to create it manually with shell command as if i remember Proxmox only support replication ...
@scytob 5 หลายเดือนก่อน ⁺¹
erasure-encoding has its downsides in terms of latency and reliability in a 3 node cluster
@fasti8993 5 หลายเดือนก่อน ⁺¹
Can you provide information on cpu utilization if the Thunderbolts are doing maximum traffic? If I get it right, you are kind of sacrificing a little bit of your cpu as an ethernet controler… This might play a roll for making a decision on which version of the machine to by cause dual a 25g nic goes for 50 bucks on ebay…
@Jims-Garage 5 หลายเดือนก่อน
I'll take a look at it and post online
@NerdzNZ 4 หลายเดือนก่อน ⁺¹
I am looking to do this with my MS-01s but I am not running them with Proxmox, I am using Debian 12. Any chance you have notes on what would be different for that approach?
@Jims-Garage 4 หลายเดือนก่อน
@@NerdzNZ the process should be identical, Proxmox 8 is Debian 12 under the hood.
@NerdzNZ 4 หลายเดือนก่อน ⁺¹
@@Jims-Garage This bit tripped me up, because those are very proxmox NIC names: allow-hotplug en05
iface en05 inet manual
mtu 65520
iface en05 inet6 manual
mtu 65520
allow-hotplug en06
iface en06 inet manual
mtu 65520
iface en06 inet6 manual
mtu 65520
Mine in the MS-01 looks more like enp87s0 so I wasn't sure if it was "safe" this is new territory for me and network is not a strong point
@Jims-Garage 4 หลายเดือนก่อน
@@NerdzNZ eno and enp is just the naming convention depending on how the devices are detected. Just amend the instructions accordingly. Should be fine.
@akurenda1985 5 หลายเดือนก่อน ⁺²
That's some great network speed, but I can't help but see you're just using local LVM storage. So no zfs replication or ceph? Seems like a waste to just use the 25 gigabit network just for migrating VM's.
@Jims-Garage 5 หลายเดือนก่อน
Thanks. As I mention in the video I'll be doing Ceph in the near future (want the infrastructure in place first).
@crc-error-7968 5 หลายเดือนก่อน ⁺¹
Ciao Jim, which is the power consumption of the 3 ms01? idle, with few wms, etc.?
@Jims-Garage 5 หลายเดือนก่อน ⁺¹
This cluster is currently pulling 160W with what you see. I suspect you could get lower with core pinning but I haven't done that yet. I simply wanted to migrate over ASAP as that would give me power savings and speed improvements regardless.
@crc-error-7968 5 หลายเดือนก่อน ⁺¹
@@Jims-Garage So, about 54w/unit.
Honestly, I thought they would consume much less when idle..
I'm looking for a solution similar to yours to downgrade (in terms of consumption) and upgrade (in terms of CPU) my homelab, but it's easy to make a mistake since there is an ocean of possibilities between mini PCs and components to choose from.
@Jims-Garage 5 หลายเดือนก่อน
Idle consumption is almost pointless because if it's idle turn it off. The figures I gave were running around 12VMs across all nodes.
@crc-error-7968 5 หลายเดือนก่อน
@@Jims-Garage sorry, for idle I mean do nothing heavy, something like mqtt receiving data, home assistant does its stuffs, uptime kuma which monitoring services, traefik waiting new requests to redirect, jellyfin waiting me choosing something to see, etc.. etc..
@djvincon 5 หลายเดือนก่อน ⁺¹
FOR THE EMPEROR!
@潜水屋 2 หลายเดือนก่อน ⁺¹
is Thunderbolt Networking working between mac and pve, or between windows and pve.
@Jims-Garage 2 หลายเดือนก่อน
Good question, I don't know. I suspect it could do but I'm only using it for networking between nodes and I don't have Mac to test with
@jgarfield 3 หลายเดือนก่อน ⁺¹
What do you use for diagramming?
@Jims-Garage 3 หลายเดือนก่อน
This is draw.io
@ClacKeyTech 5 หลายเดือนก่อน ⁺¹
i recently built 2 pro aggregation switches in my schools network
@Jims-Garage 5 หลายเดือนก่อน
That's awesome. Must be great for that kind of environment.
@ClacKeyTech 5 หลายเดือนก่อน ⁺¹
@@Jims-Garage absolutely
@NetBandit70 5 หลายเดือนก่อน ⁺⁴
My-craw-tic
Wat!?!
@Jims-Garage 5 หลายเดือนก่อน
How is it supposed to be pronounced?
@michaeljolley6773 5 หลายเดือนก่อน ⁺¹
This may be a dumb question but can I do this using USB 3.1 Gen 2? I don't have USB C ports but at least the USB 3.1 Gen 2 has faster speeds than my single 1gb lan port on each node.
@Jims-Garage 5 หลายเดือนก่อน
I don't believe so, you can do it with thunderbolt 3 though I think.
@scytob 5 หลายเดือนก่อน
USB3 doesn't support XDOMAIN specification so cannot use XDOMAIN networking (aka thunderbolt). USB4 is actually thunderbolt 3+some extensions (TB4 has more mandatory items).
@scytob 5 หลายเดือนก่อน
@@Jims-Garage
@scytob
0 seconds ago
USB3 doesn't support XDOMAIN specification so cannot use XDOMAIN networking (aka thunderbolt). USB4 is actually thunderbolt 3+some extensions (TB4 has more mandatory items).
@Benjamin-rd7xi 2 หลายเดือนก่อน ⁺¹
Hey there, I noticed something at th-cam.com/video/Tb3KH2RbsTE/w-d-xo.html where you show the topology. I the guide it shows that the interface for the other nodes switch between en05 and en06. In your topology you are only using en06 this the next hop to sanguinius is dorn, but it really should be sangunius. So if dorn is down your connection betwenn the other two will be broken. Maybe this also helps with IPv4
@Jims-Garage 2 หลายเดือนก่อน
@@Benjamin-rd7xi thank you, I will double check that to see if it's still an issue
@BenjaminBenStein 5 หลายเดือนก่อน ⁺¹
🎉
@Andy-fd5fg 5 หลายเดือนก่อน ⁺¹
hummm mesh networking... now all i need is enough computers to try this on
@Jims-Garage 5 หลายเดือนก่อน
I was really surprised with how good it is!
@casperghst42 5 หลายเดือนก่อน
As for the drop outs, Wayne Fox made some testes using Caldigit and Apple (Pro) cables, and found that non-active cables might not always be perfect (th-cam.com/video/BcX8yeqf_5w/w-d-xo.htmlsi=siTWqYroIBC6fzL4) - there is a price difference though.
@scytob 5 หลายเดือนก่อน
I use short OWC TB4 cables, work perfectly.
@ws_stelzi79 5 หลายเดือนก่อน
Man how you pronounce Mikrotik ... 🤔
@Jims-Garage 5 หลายเดือนก่อน
How's it supposed to be pronounced? 😂
@headlibrarian1996 5 หลายเดือนก่อน
Micro-tick was my assumption.
@cryptodendrum 4 หลายเดือนก่อน
@@headlibrarian1996 I've always heard it pronounced Micro-tik too, but from now on, it shall officially always be called My-craw-tic. I'm going to use that everywhere now. :D
@nirv 5 หลายเดือนก่อน
Why is there a dollar sign in the center of a heart and "thanks?" Why are there affiliate links in the description?
No dude. You are violating the spirit of the internet. Get these dollar signs out of my face. This is the INTERNET. What are you doing dude?

ต่อไป

เล่นอัตโนมัติ

Highly Available Storage in Proxmox - Ceph Guide