Glad you went with real distributed filesystems. As the author of mergerfs I've seen people try to use mergerfs + NFS/SMB/etc. to build out a "poor man's remote filesystem" and while it can work OK if you don't have sufficient hardware to dedicate to just a filesystem it certainly is an abuse of said technologies.
I can see how MergerFS would get ugly very fast if trying to use it like Gluster. I think distributed storage is cool, but defiantly needs to be planned and setup correctly to work right. Thanks for your work on MergerFS, its great to have as an option.
Your content has always been good, but as of late I really need to say... You've been consistently producing higher and higher quality content! Your explanations are informative, engaging, smooth and paced well even for new users! Thank you for sharing your knowledge with the world
Years ago I really wanted to move to those 5TB 2.5" drives. Less space. Less noise. Less power use. Unfortunately, the reliability was not great in my experience and the cost went up over time rather than down.
Good video. I'm worried though about down the road, when the pool starts filling up past ~80% and drives start dying. SMR drives may seem to work, until they don't. They're just horrible tech that nobody asked for, when CMR drives fill the role perfectly fine. MHO, your client would be better off tossing these 5TB SMR drives as a ticking time bomb and rebuilding with SSDs, or buying a SAS disk shelf and rebuilding with 3.5-inch NAS-rated or SAS disks for long-term. But at any rate, they're going to need regular reliable BACKUPS.
Yea I worry about when the drives are full too. I didn't fully test rebuilds with full drives, but I'll follow up with them to see how its holding up. I am also curious personally what the failure rate will look like on these drives as I have heard horror stories with them failing early, but not good stats on these models.
I also experimented with ceph on Dell R720 and Proxmox a year ago. I have SATA SSD drives and need high random read performance. Also tested nvme drives. But performance was always very poor: i measure it in io operations per second, and it always was a factor of 10 worse than local zfs. Network should not be a problem as i also have Infiniband 40G. ( some magical internet posts also declare that RDMA should be possible. I could use it in beeGFS, but in ceph there was no documented way to get it running ). So my solution is to use zfs, and replication. beeGFS was really fast - but there is no parity possible. Or you you think that there could be a way to get ceph on ssd faster?
You could use zfs as the base (and some cheap optane for zil if sync write with proper speed is needed) and then put longhorn on top of that as shared fs across all servers. It's not as sophisticated as ceph but works well for me. That said: glusterfs also worked fine for me but now that kubernetes 1.25 deprecated it and truenas dropped it I don't really think I should keep using it anymore in the future. What do you think?
Do you reckon if the slow performance is mostly due to the 10gbit network and sync or is ceph just a bad solution in general for small deployments like this? I've been thinking about moving to a ceph cluster myself, though at 100gbit networking speeds, but I would only have a small number of OSDs as well and I'm worried about performance. I could move my HDDs to zfs and use zfs-ha to keep my storage highly available in the network, which is really all I'm after.
The SMR drives are the limiting factor here I'd say. 10 GBit is totally fine in such a setup. Sure - 100 is better but in terms of price/performance 10gbe is a sweet spot atm I'd say.
The HDDs are the limiting factor here. I did a setup on the same hardware, but with 8 SATA SSDs, and performance was fine(I think about 300MBps ish read and write, ~3k rand read IOPS). Faster disks, processors, and network would help to improve, but the disks were the main limit here. I have seen many successful small Ceph setups.
This is an incredible video I love seeing fellow creators giving all of us new ideas and new ways to do things, thanks so much for the video and you have a new sub!!
Ouch, I got shivers when you mentioned 2.5" Seagate 5TB drives, poor dude, what a waste of money. I had several SMR drives in my cluster before I knew better, had to throw them out. Just useless. It is hard though, to find large 2.5" drives that aren't SMR.
The addiction is real. Ar first 1tb was way more than i would ever need... Now at 3tb total and it's almost completely full... I think I just keep adding more stuff until eventually it's full, no matter how big it is.
Those are rookie numbers. You've got to bring those up. Altogether I have about 70TB. I'm a data hoarder though, so you might not need anywhere near that. 🤣
We had same dilemma, but went completely different route at the end: iscsi SAN. Multiple fat disk servers, each having multiple md raid arrays, each exported as separate iscsi device. All that is imported into slim iscsi initiator head server and bound together into a pool through LVM, sliced up into volumes for different shares and exported to servers and clients through NFS and Samba. It's not clustering, but it's relatively easy to add a box, export raid arrays through iscsi and import it from head server and expand the existing pool.
very interesting challenge .. especially with SMR hdds. 45Drives has a 3 yr old video using OpenCAS to accelerate Ceph HDD with Flash (optane I believe). Maybe a little SSD in the mix could bring the performance up? Keep the great content coming!
Wonderful explanation and details. You should have told us newbies what SMR was. I'd love you to go over my [very small/not awesome] server setup as I'm wanting to add a power vault to my r330 and upgrade the r330 in the future. Love your content!!
What does the clients backup solution look like? Maybe do away with raid entirely as it makes very little sense for a home environment, do a JBOD, maybe with mergerfs on top for convenience and just restore data from the backup if a drive dies, no resilvering needed.. Or are we talking the classic homelab strategy of "RAID is sexy, backups are not, let's rely on RAID to keep my collection of pirated movies safe"?
I typically go with raid due to the higher speeds, and its often easier to replace a failed drive in a raid than restore backups. Using RAID also allows a drive to fail without affecting operation at all. Mergerfs/pooling with good backups also can make a lot of sense for situations here, but we wanted more speed than a single drive would give here.
A video I would like to see is: how to setup a small 3 node cluster that serves: (a) 18+ tb of object store for many tiny files accessed over zerotier network with local file caching of 1tb at each end remote client .. posix LAN style file access over high latency wan, (b) high speed object, file, and block store for compute cluster. Issue is devops cluster and remote file server are really hard to manage. Cant use smb or nfs for wan. Massive binaries for chip development tools make scaling our docker images across cluster hard. Object store typically not great for posix style file access over wan. Coherency of decentralized file cache a problem. Zerotier expensive if each vm / container in cluster has own zero-tier managed ip address. Difficulties managing ip and single sign on address space for both zero-tier wan and local high speed lan for inter vm/docker/.. communications for CI/CD workers. - basically insanely difficult to convert old style lan for software developer company and decentralized the acess where all workers are work from home.
think about getting customer to think about optane and/or pci-s ssd too, even those on raid0 - it could change the equation on performance but probably won't help rebuild times - maybe have some hot spares? a followup and/or special on ultimate nas that still is somewhat budget would be appropriate followup - since prices are normalizing on all this old enterprise gear i think more people will be looking for high performant but still economical solutions - there has to be a sweet spot out there that balances out the hw and sw for an optimum solution but there are a ton of variables so finding exactly what will work best is not easy for sure. #tylenol christ
Good luck with week long rebuilds and multiple vdevs, lol. Especially considering how 24h+ rebuilds already accelerated dRaid development and adoption.
Yea those rebuild times scare me. I really should have tested the Draid rebuilt times to see how much they help, but this is one of those cases where Draid really starts to make sense.
Best practice: XCP-ng and TrueNAS Core for Business Users and Proxmox and TrueNAS Scale for Home Users. No proprietary storage solutions, if they use closed source software.
@@ElectronicsWizardry The only open source alternative which has similar features to VMware ESXi. It’s not perfect yet, but that will change soon due to the needs of the market.
@@clee79 Strong disagree, Proxmox is practically a drop-in replacement for VMWare. And in my opinion, XCP-ng is nowhere near the level of features that Proxmox has and is much more limited in hardware support.
From the looks of their docs they wouldn't meet my needs here, so I didn't want to test something that probably wouldn't work. But I'll likely give them another look when I do more cluster testing.
I haven't tested it, but it looks like ceph wants to do sync (i.e. the data stored in it's final locations) before it will confirm a write so I don't think a cache would work here (but I may very well be off-base as my experience with ceph is very limited).
That was my thoughts too. Tiering would help with some writes, but once the fast tier was full, the slow HDDs would still be very limiting. I think I setup a tied Ceph setup and had the same issues with the extremely slow HDD tier, so I skip it as a option. SSD only with these servers also worked well.
@@ElectronicsWizardry Cache tiering is depreciated in Ceph now and isn't maintained, they recommend using nvme wal/db devices instead. Which as you noted, won't help with sustained seq writes. nb. Consumer SSD's suck as ceph journal devices, ceph will kill them rapidly with the vast number of writes it does and their performance is anywhere near enough. You really need enterpise ssd's such as the intel range.
I hope you do not take it as a personal attack. You are in the media area. So I think you may consider JerryRigEverything or Jo Rogan style for your hair. Again this is not to offend. I'm a fan of your content.
Glad you went with real distributed filesystems. As the author of mergerfs I've seen people try to use mergerfs + NFS/SMB/etc. to build out a "poor man's remote filesystem" and while it can work OK if you don't have sufficient hardware to dedicate to just a filesystem it certainly is an abuse of said technologies.
I can see how MergerFS would get ugly very fast if trying to use it like Gluster. I think distributed storage is cool, but defiantly needs to be planned and setup correctly to work right.
Thanks for your work on MergerFS, its great to have as an option.
Your content has always been good, but as of late I really need to say...
You've been consistently producing higher and higher quality content!
Your explanations are informative, engaging, smooth and paced well even for new users!
Thank you for sharing your knowledge with the world
Years ago I really wanted to move to those 5TB 2.5" drives. Less space. Less noise. Less power use. Unfortunately, the reliability was not great in my experience and the cost went up over time rather than down.
Good video. I'm worried though about down the road, when the pool starts filling up past ~80% and drives start dying. SMR drives may seem to work, until they don't. They're just horrible tech that nobody asked for, when CMR drives fill the role perfectly fine.
MHO, your client would be better off tossing these 5TB SMR drives as a ticking time bomb and rebuilding with SSDs, or buying a SAS disk shelf and rebuilding with 3.5-inch NAS-rated or SAS disks for long-term. But at any rate, they're going to need regular reliable BACKUPS.
Yea I worry about when the drives are full too. I didn't fully test rebuilds with full drives, but I'll follow up with them to see how its holding up. I am also curious personally what the failure rate will look like on these drives as I have heard horror stories with them failing early, but not good stats on these models.
I also experimented with ceph on Dell R720 and Proxmox a year ago. I have SATA SSD drives and need high random read performance. Also tested nvme drives. But performance was always very poor: i measure it in io operations per second, and it always was a factor of 10 worse than local zfs. Network should not be a problem as i also have Infiniband 40G. ( some magical internet posts also declare that RDMA should be possible. I could use it in beeGFS, but in ceph there was no documented way to get it running ). So my solution is to use zfs, and replication. beeGFS was really fast - but there is no parity possible. Or you you think that there could be a way to get ceph on ssd faster?
You could use zfs as the base (and some cheap optane for zil if sync write with proper speed is needed) and then put longhorn on top of that as shared fs across all servers. It's not as sophisticated as ceph but works well for me.
That said: glusterfs also worked fine for me but now that kubernetes 1.25 deprecated it and truenas dropped it I don't really think I should keep using it anymore in the future. What do you think?
Do you reckon if the slow performance is mostly due to the 10gbit network and sync or is ceph just a bad solution in general for small deployments like this? I've been thinking about moving to a ceph cluster myself, though at 100gbit networking speeds, but I would only have a small number of OSDs as well and I'm worried about performance. I could move my HDDs to zfs and use zfs-ha to keep my storage highly available in the network, which is really all I'm after.
The SMR drives are the limiting factor here I'd say. 10 GBit is totally fine in such a setup. Sure - 100 is better but in terms of price/performance 10gbe is a sweet spot atm I'd say.
The HDDs are the limiting factor here. I did a setup on the same hardware, but with 8 SATA SSDs, and performance was fine(I think about 300MBps ish read and write, ~3k rand read IOPS). Faster disks, processors, and network would help to improve, but the disks were the main limit here. I have seen many successful small Ceph setups.
This is an incredible video I love seeing fellow creators giving all of us new ideas and new ways to do things, thanks so much for the video and you have a new sub!!
Ouch, I got shivers when you mentioned 2.5" Seagate 5TB drives, poor dude, what a waste of money. I had several SMR drives in my cluster before I knew better, had to throw them out. Just useless.
It is hard though, to find large 2.5" drives that aren't SMR.
The addiction is real. Ar first 1tb was way more than i would ever need... Now at 3tb total and it's almost completely full... I think I just keep adding more stuff until eventually it's full, no matter how big it is.
Those are rookie numbers. You've got to bring those up. Altogether I have about 70TB. I'm a data hoarder though, so you might not need anywhere near that. 🤣
I started at 500gb. Now I'm at 25gb. It's an addiction.
@@patzfan8086 looks like you're about to kick your addiction, you downsized to just 5% of what you were using before!
We had same dilemma, but went completely different route at the end: iscsi SAN. Multiple fat disk servers, each having multiple md raid arrays, each exported as separate iscsi device. All that is imported into slim iscsi initiator head server and bound together into a pool through LVM, sliced up into volumes for different shares and exported to servers and clients through NFS and Samba. It's not clustering, but it's relatively easy to add a box, export raid arrays through iscsi and import it from head server and expand the existing pool.
very interesting challenge .. especially with SMR hdds. 45Drives has a 3 yr old video using OpenCAS to accelerate Ceph HDD with Flash (optane I believe). Maybe a little SSD in the mix could bring the performance up? Keep the great content coming!
OpenCAS seems interesting. Thanks for letting me know about it. I could see how that could help a lot with these drives and Ceph.
Tell your friend to deal with the pirated porn himself 😂
Wonderful explanation and details. You should have told us newbies what SMR was. I'd love you to go over my [very small/not awesome] server setup as I'm wanting to add a power vault to my r330 and upgrade the r330 in the future. Love your content!!
similar servers, 900tb SAS Enterprise mixed with 1tb ssd's, truNAS Scale
I never can make my mind up on this stuff. I just know I want a lot of space, decent redundancy, and gotta go fast.
This is LITERALLY my exact issue I'm having with my HP Z440 Unraid build... bought a rear PCI-E bracket to attach an HDD too and see how that goes.
How is TrueNAS with ZFS a clustered solution?
Interesting Problem. Thank you for going through this with us.
What does the clients backup solution look like? Maybe do away with raid entirely as it makes very little sense for a home environment, do a JBOD, maybe with mergerfs on top for convenience and just restore data from the backup if a drive dies, no resilvering needed.. Or are we talking the classic homelab strategy of "RAID is sexy, backups are not, let's rely on RAID to keep my collection of pirated movies safe"?
I typically go with raid due to the higher speeds, and its often easier to replace a failed drive in a raid than restore backups. Using RAID also allows a drive to fail without affecting operation at all. Mergerfs/pooling with good backups also can make a lot of sense for situations here, but we wanted more speed than a single drive would give here.
A video I would like to see is: how to setup a small 3 node cluster that serves: (a) 18+ tb of object store for many tiny files accessed over zerotier network with local file caching of 1tb at each end remote client .. posix LAN style file access over high latency wan, (b) high speed object, file, and block store for compute cluster. Issue is devops cluster and remote file server are really hard to manage. Cant use smb or nfs for wan. Massive binaries for chip development tools make scaling our docker images across cluster hard. Object store typically not great for posix style file access over wan. Coherency of decentralized file cache a problem. Zerotier expensive if each vm / container in cluster has own zero-tier managed ip address. Difficulties managing ip and single sign on address space for both zero-tier wan and local high speed lan for inter vm/docker/.. communications for CI/CD workers. - basically insanely difficult to convert old style lan for software developer company and decentralized the acess where all workers are work from home.
think about getting customer to think about optane and/or pci-s ssd too, even those on raid0 - it could change the equation on performance but probably won't help rebuild times - maybe have some hot spares? a followup and/or special on ultimate nas that still is somewhat budget would be appropriate followup - since prices are normalizing on all this old enterprise gear i think more people will be looking for high performant but still economical solutions - there has to be a sweet spot out there that balances out the hw and sw for an optimum solution but there are a ton of variables so finding exactly what will work best is not easy for sure. #tylenol christ
Great video.
Great video, very usefull. Thanks.
Good luck with week long rebuilds and multiple vdevs, lol.
Especially considering how 24h+ rebuilds already accelerated dRaid development and adoption.
Yea those rebuild times scare me. I really should have tested the Draid rebuilt times to see how much they help, but this is one of those cases where Draid really starts to make sense.
Best practice: XCP-ng and TrueNAS Core for Business Users and Proxmox and TrueNAS Scale for Home Users. No proprietary storage solutions, if they use closed source software.
That's what I am trying figure out for work at a school.
I should give XCP-ng another try. What do you like about it?
@@ElectronicsWizardry The only open source alternative which has similar features to VMware ESXi. It’s not perfect yet, but that will change soon due to the needs of the market.
@@clee79 Strong disagree, Proxmox is practically a drop-in replacement for VMWare. And in my opinion, XCP-ng is nowhere near the level of features that Proxmox has and is much more limited in hardware support.
Cha ge all smr 5tb hard drives
No MinIO testing?
From the looks of their docs they wouldn't meet my needs here, so I didn't want to test something that probably wouldn't work. But I'll likely give them another look when I do more cluster testing.
Thank you for sharing
Nice. Thank you
Linstor can be option too
Excellent as always
ceph CACHE TIERING?
I haven't tested it, but it looks like ceph wants to do sync (i.e. the data stored in it's final locations) before it will confirm a write so I don't think a cache would work here (but I may very well be off-base as my experience with ceph is very limited).
That was my thoughts too. Tiering would help with some writes, but once the fast tier was full, the slow HDDs would still be very limiting.
I think I setup a tied Ceph setup and had the same issues with the extremely slow HDD tier, so I skip it as a option. SSD only with these servers also worked well.
deprecated/do not use.
@@ElectronicsWizardry Cache tiering is depreciated in Ceph now and isn't maintained, they recommend using nvme wal/db devices instead. Which as you noted, won't help with sustained seq writes.
nb. Consumer SSD's suck as ceph journal devices, ceph will kill them rapidly with the vast number of writes it does and their performance is anywhere near enough. You really need enterpise ssd's such as the intel range.
Good video, i think you should just shave your head, it would look a lot over long term. :) GL man,
Only idiots use SSDs for storage. 🤦♂️
Only an idiot would comment something like that... Our multi petabyte ceph cluster is flash only...
Bad "solution" choice.
Research before trying to implement.
Care to explain more? What would have you chosen here?
For me I'll take a bunch of free USB drives and raid them together in windows. Make a folder share, boom. Done ✅
@@dominick253 LOL
I hope you do not take it as a personal attack. You are in the media area. So I think you may consider JerryRigEverything or Jo Rogan style for your hair. Again this is not to offend. I'm a fan of your content.