Another reason I use ZFS is the utilities that it provides, such as snapshots and send and receive. They make it easy to backup and restore. Even with hardware raid solutions, I would still put ZFS on top of the logical volume.
A very important information needs to be added... These cards are designed for servers, where a lot of air is moved from the front to the back by many fans. These RAID cards get hot and they die if not appropriately cooled. I learned that the hard way! Fortunately and as you told, buying a new card of the same type (LSI 8260-8i) brought all data back. Would be nice to see a test of different 9xxx controllers of different ages and revision to get a feeling about how hot they get.
Looking at the different generations of LSI/Avago/Broadcom RAID cards seems fun. I'll start collecting them if I can get my hands on them. I've always been curious how the NVMe ones work too.
I added a fan to cool my LSI card after one burn out. I also replaced the thermal paste every few years. And yes, I did the same when the card failed. Just replacing the card (even with a newer model) brought the RAID array back.
This is a fantastic video I'm still learning ZFS I'm a big fan of hardware raid because it just works . Most people that hate on hardware raid have never worked in a professional environment. I run my own personal servers on nothing but hardware raid I source them used from eBay. I always grab a backup controller in the event of an emergency we are talking $50.00 to $80.00 for the used Dell or Lenovo branded controllers.
One issue I'm running into in the wild as MSP is clients wanting gig or multi-gig networking, but their drive arrays just aren't fast enough to give them that performance for internal resources. Higher internet bandwidth than local is certainly not the example case I was expecting, but I've been trying to push clients towards better drive performance choices for a while now. Thanks for helping me to be more prepared! The only luck you get is the luck you build in preparation.
I use TrueNAS Scale on a old rack server I bought cheap on eBay, so ZFS for me. My server is a Dell PE 210 with one processor and only enough room for two drives, which is fine for my needs. The server is quiet enough to live in the same room with it, occasionally the fans ramp up but not for long. It is fast enough to copy or download large files, I mean it might take a minute to get it done sometimes. My network is slow anyways so I'm not sure where the bottleneck is. Loading files like RAW camera files is plenty quick, and I have no problem streaming video to my TV. I do appreciate your work here, I did not know much about hardware raid, so I was mainly watching for info about that. Thanks.
That requires a separate drive right? I might look into that in the future, but when planning this video I already was doing a lot of testing and decided to skip the mdadm journal drive testing. I might look into this for a future video though.
@@ElectronicsWizardry Definitely a separate drive. Its in the official manpages but often not really known to alot of people. Thanks for doing the raid testing as well!
Most newer raid cards use nand and super caps (when power is lost it dumps the raid ram cache into the nand on the raid card) also some support swappabble cache so if the raid card fails while it was online you can move the cache to the new raid card This is assuming you have enabling the write cache mode wirch uses the cache on the raid card
Yea I was debating on going into more detail about how supercaps have gotten much more common recently, but decided not include that to try to keep the video shorter. Do you know the model of one of those cards with remove cache? I've seen some that use a little RAM like board for swapping out the cache, but it seemed to mostly be for upgrading to more cache, not keeping cache form a failed card.
@ElectronicsWizardry need to specifically look for the ones that have NAND on them , the configuration of the hardware raid array (should be stored on it) and uncommited data is stored on it as well so when the card is replaced and the nand module is plugged into new card it should just work or it ask to restore previous/unknown configuration only thing you must do is make sure both cards are using same firmware (update both cards once and don't update them anymore, this is so you know both cards have same firmware) The issue with the hardware raid or HBA card not working on consumer hardware will be due to the smbus pins (buy cards that don't use smbus pins or use caption tape over the pins, flashing the firmware for IT mode doesn't disable the smbus as it's an optional hardware feature that can be implemented by the raid/hbsla card, typically dell and hp raid/HBA cards will have the smbus wired on the card for BMC/light out/idrac use) the issue stems from that smbus is supported by UEFI on consumer boards but the UEFI module is usually missing so the system just hangs on boot or you have random issues with the computer (or it works because the manufacturer of the motherboard actually didn't wire the smbus pins on the pci-e slot it self so the smbus issue doesn't happen).
@@ElectronicsWizardry Hmm a raid card with a built in nvme socket for just using for cache/recovery would be nice to see. I wonder when/if that's available now? I've got an old rocketraid 2740 (pci-e v2) x16 card that I really want to upgrade but every time I do a search for "16 port raid pci-e 3" or 4 I don't seem to see many and most are $500 or something crazy.
Thanks for sharing that! I’m also curious about what environmental setup is best for all NVMe SSDs in a RAID. I mean what raid controller or soft-raid should be used? And what file system is the best fit for SSDs? Would you mind making another video to talk about this? Really appreciate it! 👍
There was no mention about BTRFS (as suggested in the title and description) and you missed the most important feature of ZFS: Checksums! A performance comparison is nice but if you care about your data, there is no way around checksums
OOps, edited the title with MDADM instead. As far as checksums, that is one big advantage of ZFS + BTRFS + others with that support. I will mention that HDDs have internal checksums to prevent data from being read incorrectly, and typically results in a low enough error rate.
My question is, what happens, if you introduce a one bit error on a drive attacjed to the hatdware raid controller? You have to put an error on every drive on different locations, to be sure, that the controller does not always overwrite the wrong bit on the parity drive for example. ZFS can handle these errors with its checksum. When I tested Qnap QTS it could not do that and the NAS returned faulty data.
I think all the hardware RAID cards I know of will assume the data from the drives is correct, so you would see a error in this instance. Since drives have checksumming of the data along with the SATA/SAS interface the assumption is the risk of reading incorrect data is low. I should do a video looking into this in the future.
@@ElectronicsWizardry there are RAID controllers + SAS drives that have their own checksumming, but it's not something you'll be realistically able to setup in a home lab
I already had data corruption problems with several files being unreadable. I did not recognize it for a longer time period. So my backups were also corrupted. I still do not know, what the reason was. Bad Sata cable, faulty driver, faulty CPU or RAM? As I used hardware raid at that time, it might also have been the raid controller. I value zip files since that problem, because they do a crc32 checksum, so you know immediately, which data is correct.
The main reason why you dont use hardware raid is because of reliability especially with modern filesystems which are designed to work with native raw drives. You cannot depend on hardware raid to communicate properly especially on consumer hardware. Unless the raid card is certified to use with the filesystem like zfs and btrfs stay far away or use it in IT MODE. With hardware raid cards you also have to worry about drivers and software support for your OS.
No raid card is compatible with zfs, as far as I am aware. It needs raw access to the disks, and a raid card will not allow that. Not sure about btrfs. A raid card without the raid ability (just connects disks) is just called an hba.
@olefjord85 that is specifically called IT mode or "hba" mode. Any means of using a raid card in the mode it was designed for (raid mode) means trouble with zfs
@ so back to compare ZFS to MD and a lot more options of configuration for ZFS than MD. So not easy to compare because depending on the files (size, content) and the usage (sequential read or not, concurrent access, read or write, only access to recent files or totally random) the best is not obvious and obviously not XFS in every case.
Very good overview of the various implementations. I would like to stress that the comments at 1:58 are what really makes hardware RAID unattractive nowadays: hardware RAID controllers have gotten expensive. In addition to the make and model being correct to move data in case of hardware failure, firmware versions are also important here. Back in my data center support days, we always had a cold spare of it laying around and never got updated firmware until it was ready to replace the failed unit. There have been times were down grading firmware was also the path of least resistance to bring back a storage system. Another interesting feature of some select RAID cards is that they off their own out-of-band management network ports with independent PoE power. This permits setting up and accessing the RAID controller even if the host system is offline. The write cache can be recovered directly from the device, archived safely and restored when host functionality returns in a recovery mode. Automations can be setup to copy that write cache on detection of host issues to quickly protect that cached data externally. Lastly a huge feature for enterprises is hardware RAID card support for virtualization. This permits a thinner hypervisor by not needing to handle the underlaying software storage system for guest machines. All great enterprise class features that are some of the reasoning why RAID controllers are so expensive. (That and Broadcom's unfettered greed.) ZFS has something similar to journaling which can be separated from the main disk system called the ZFS Intent Log (ZIL). Similarly there are options for separate drives to act as read and write caches. Leveraging these features can further accelerate ZFS pools if used in conjunction with high IOPs drives (Intel Optane is excellent for these tasks). Redundancy support for the ZIL, read cache and write cache is fully supported. CPU utilization here is presumably higher but I haven't explicitly tested it. I have seen the performance results and they do speed things up. Common setup is using spinning disks for the bulk storage with these extras as fast NVMe SSD. Costs of using these ZFS features used to be quite high and comparable to the extra cost of a RAID card but storage prices of NVMe drives has dropped significantly over the past few years changing the price/performance dynamic. 5:27 Both hardware RAID and ZFS can be setup with external monitoring solutions like Zabbix for monitoring and alerts as well. In the enterprise world these are preferred as they're just the storage aspect for a more centralized monitoring system. Think watching CPU temperatures, fan speeds and the like. They don't alter the disks like the vendor supplied utilities or software tools, but they do the critical work of letting admins know if anything has gone wrong. One last thing with ZFS is that it can leverage some newer CPU features to accelerate the parity computations that your older Opteron may not support. With ZFS's portability, you could take that disk array and move it to a newer faster platform and speed things up that way. I would also look into various EFI setting for legacy BIOS support on that i7 12900K board. Ditto for SR-IOV which that card should also support.
Cheaper 10-15yr secondhand (thirdhand?) PCIe-2.0 cards top out around the numbers listed here. For example the Areca 1880 series are incredibly rock-solid cards, and they work just as well without a battery...just make sure the server is on a UPS and shut it down before the UPS dies ;-)
BTRFS raid6 user here 8x8tb. using raid1c4 for metadata and raid6 for data to mitigate the write hole bugs until the raid stripe tree is fully released. performance is ok, and requires very little ram, (2gb-4gb of ram is fine).
I've been using all three options for a little over 10 years now... in up to small enterprise systems, with roughly three generations old hardware. And I'm still a huge fan of hardware RAID. But... not all hardware RAID is created equal. I've used PERC, LSI, Adapted and Areca hardware RAID solutions... and while Areca has since changed ownership, their hardware has hands down been my favorite, for both ease of use and raw performance. I personally think hardware raid options have gotten a bad rep, because it's not a system that most people have to deal with on a day to day basis... and recovery can feel archaic and intimidating. But the shear portability of a nearly platform agnostic hardware raid array, from one system to another is very hard to beat. My experience with MDADM RAID has been a bit different than what was presented here when compared to ZFS and hardware RAID. I've found MDADM RAID-5 to be nearly as fast as hardware RAID, on similar hardware, even through a SATA port expander. ZFS, to me, even with a stout CPU, 96gb of ram and SSD cache... still didn't perform as well as the RAID cards I was used to... granted I realize that none of my testing was cutting edge, but this is also the hardware that was available in our budget. Software RAID is cool, and ZFS data integrity is second to none. But the hardware overhead of software RAID, in my experience, will rarely perform as well as purpose built hardware. At least that's my two cents. I appreciate the videos, and look forward to more!
I went ZFS route back when I've setup my proxmox multi vms combo server back in 2021. So far (knock on wood) after 2 separate drives failures no data was lost. Before I was using either proprietary NAS solutions (qnap) or builtin motherboard raid configurations and this always ended up with partial or complete data loss. Thankfully after my first disk failure I always kept some sort of external backup so even though qnap and raid solutions failed me I had some way to restore my data. I'm not saying ZFS is bulletproof (look at LTT situation from couple years prior), but if you do regular pool scrubs and extended smartctl tests - so basically you won't let your ZFS pool to "rot" then I'm pretty sure ZFS is the best their is (so far).
I use a second hand 8 port hardware raid card with the raid bios removed so just passes the disks . I've been using ZFS for the last few year so almost maxed out my 5 drives , so can't wait for single disk expansion which should be here soon.
Been using hardware RAID 6 and so far I'm pretty happy with it. Initially started with RAID 5 for the speed because I planned to also use it for games but a faulty 3.3v adapter and getting a good deal on 2 4TB PM1733's caused me to switch to RAID 6 for peace of mind. Having the ability to up and move to Linux whenever it gets good without any hassle was a huge reason as well. The only solutions I found for cross OS's was a VM which was out of the question because of gaming and an entire other PC which was out of the question because lol no.
You should _always_ use a form of ssd as a log device for a raidz1 or raidz2 if you want a decent performance. An alternatve is to force the pool to be asyncrons, but then you can lose up to 5 seconds of data. One of the best log devices you can use are smallish optane drives, just avoid the 16GB ones since their sustained write perfomance is to low.
MD raid you seem to have journal enabled (possibly bit mapping enabled as well as that avoids you having to do a full rebuild when a drive is removed and re plugged back in or a unclean shutdown) Unsure if it's a feature of mdadm but on Synology it can skip Unallocated space on rebuild
If you can’t get into the pre-boot menu on a consumer board it might be because you didn’t go in and enable that function. You might also have to disable fast boot. Specifically there is an option though that enables the pre-boot screen to show.
It's called Option ROM, and there are Legacy and EFI ROMs. Typically older cards don't have EFI compatibility, which is why they won't show up at boot.
Very pertinent. I switched from a 4th-gen Celeron-powered build with MDADM (and an extra PCI SATA card) with 6 disks, to a PowerEdge R510 with an H700 Perc card. MDADM was especially slow with writes, much slower than the Perc. However, Raid-6 performance isn't impressive either.
Performance is important, of course, but for some of us the power usage might be another important factor to consider. Some time ago I was using HP P410 RAID controller and it was increasing my server's idle power consumption by 20W. That's why I decided to switch to software RAID based on ZFS.
@@chaoticsystem2211 Why, im still using card and integrated with raid 6 configurations. If guy cant effort better, you can have dl380 g7. It works with ssd drives for at least 7 years now.
That hardware RAID card is freaking old. It's a really bad comparison for 2025 - The difference between PERC 11 and 12 is huge in itself. Should do this again with a H965i to show what 2025 hardware can do.
Its an OK comparison since ZFS and MDADM are running on an ancient platform. Also because it can be purchased for 20$. Makes it easier to compare since ZFS is free.
The opteron server he was running these tests on is very slow. Modern cpu's are so fast now that software raid would probably show much lower utilization. The comparison with old hardware raid cards is tricky because they likely won't run on modern uefi motherboards, especially consumer ones. He kind of said this at the beginning of the video.
Have an Adaptec ASR-8805 with BBU in a consumer ASUS motherboard in UEFI mode, running for several years with Proxmox, no problem. Bought 2 of these controllers they were so cheap in case of issue.
Be careful with the results of this video. The system he's using has a very weak and underpowered processor in terms of single core IPC and without any additional information his performance problems are most likely localized to his system configuration. I've been using mdadm for years on a variety of different configurations and the only caveats were slightly lower. Random read speeds and in some cases higher disc latency than hardware RAID arrays but other than that, as long as you have a decent CPU (a 10-year-old optron is hardly a good test candidate) It's a viable solution for Budget conscious individuals. Fun! Fact: Intel maintains the entire software stack for mdadm and it's implemented as part of their Intel VROC solution on high-end server grade motherboards.
Just switch from HW Raid to ZFS runs on SAS3008 HBA, I used to had weird issue with HW raid card for about to 2 years. The kernel driver just failed for some reason, I try to fixed it, spend hours surfing on forums , played with BIOS(UEFI) setting , the issue were gone for mouth, But it went back recently. Then i start to thinking i might have faulty HW raid controller, but i'm still not sure lol, So i get a HBA setup ZFS, I'm so impress the functionality from ZFS. One of the best decision I've made.
I don't know if it was posted already or not, but if you are able to set the "Storage Oprom" to legacy on consumer boards, that will allow you to use the built-in managers for the different raid cards. But given the move to uefi on everythnig, its a dying thing to see legacy features on newer mainboards.
MDADM and ZFS have the advantage that if the hardware dies, your data is easily recoverable - a new Linux PC, USB adapters and such can re-assemble and mount the data quickly and easily. RAID controllers can be fussy often needing the same firmware version so you need to buy 2 cards and keep them in sync so hopefully there's a chance you can take one set of drives and put it with the other controller. MDADM and ZFS have the ability to auto-probe and assemble drives. It's all about what happens when failures occur, and while MDADM and ZFS performance doesn't match hardware, knowing that all I need is a Linux live boot image, a spare PC and some USB adapters (if necessary) means I have easy access to the data should the Linux server decide to fail.
Your results agree with my experience. HW Raid, for small and medium systems, is not needed. I still see it for large enterprise SAN nodes, but not in this classic form. And, at least for desktop type machines, the move to solid-state storage has changed the performance equations yet again. But assuming we're talking bulk storage, AKA spinning rust, I would use ZFS over just about any other choice, especially in a NAS application.
Yea SSDs change the performance calculation a good amount. With HDDs its much more common to be IO limited than with SSDs. I choose HDDs for testing here as there much more common in home server and NAS use, and since HDDs do much worse in some workloads like random IO I thought it would be best to test with random IO. With how well this video is doing, I might look at SSD arrays, and try to get one of the NVMe raid cards to see how they work.
@@ElectronicsWizardry Yeah, that might be fun. One of my servers has an LSI card that does NVMe. No RAID in hardware for that, but the Kioxia U.2 drives I have it hooked to give insane I/O throughput. Perhaps you could do a ZFS comparison with using the SSD's in conjunction with the HDD's, either as special devices, or just cache. (And why cache drives aren't what most people think when it comes to ZFS)
Yeah, ZFS (even stripe) with NVMes and 4 fio threads with fsync=0 just max out the CPUs, or for practical case couple of VM guests maxing out their io on same zfs pool - same results on host CPU. For HDDs would choose zfs any day
Yes, solid-sate storage has completely changed the game for where you would need to use RAID. As for a NAS..... ZFS is fast becoming the filesystem to use. But nobody talks about SNAPRAID. I've been using that to store my media files for years. Never had any issues with it. And best of all, it only spins up the drive where the data is stored, rather than every drive in the array.
@Andy-fd5fg yea snapraid is flexible and does well in home media server like environments. It struggles with lots of changing files and only operates at the speed of a single disk. Unfortunately there is no perfect storage solution so it’s a pick your compromises when setting up raid or raid like solution.
The reason I use ZFS in every computer I own is because I can have a Hard Drive for bulk storage, NVME for Special Device and ZIL and RAM for fast cache.
One of the main reasons not to use hardware raid is that you might end up dependant on that specific card. If it fails and a replacement (likely second user) card cannot be found, then the data on the drives could be inaccessible (at least without having to pay for expensive data recovery services). Software RAID gives you the hardware independence needed for arrays that might still be in use in a decade’s time.
first raid card was 29160 purren 10 thousand rpm, drives were crazy expensive back in day, used in DAW station, what the average drive & lappy can do today is amazing, at low cost. awesum vid - dank yoo
Typically a replacement RAID card will detect a array and import it. Generally if its from the same manufacture with the same model or newer you can import the array. But the hardware requirements are much stricter for importing the array than software RAID solutions. I generally like to rely on backups in case something goes wrong with the whole array, but having ease of recovery is still a feature if things go wrong.
@@ElectronicsWizardry What if data was in-flight when HW RAID card failed? Would entire array become inconsistent? Isn't it the same concern as write hole problem? Thanks for the video btw, it cleared up some concepts for me
@@keonix506 Unfortunately I don't have experience with a failing RAID card, and there isn't a easy test this I can think of as I don't want to physically break a RAID card. My gut is that a RAID card failure mid operation would cause data in flight to be lost, but likely keep the array still recoverable on a different card or with data recovery utilities. From what I've seen design wise, I think the assumption is that RAID cards don't fail that often so there not designed to be easily replaced hot like RAID cards, but unfortunately I've seen a good amount of RAID cards fail in my time(I don't recall this happening during operation though)
From what I understand, a hardware RAID generally doesn't allow you to use for example an NVME SSD as a cache. So wouldn't a software raid with such a cache generally surpass performance of hardware raids? Especially for random reads/writes?
I think a few Hardware raid cards supported a SSD cache, but this feature has since been removed in newer product lines to my knowledge. You can still add a cache in software with something like bcache in linux. The annoying part of a cache is they can help a lot in some workloads, but almost none in others. If your doing random IO across the whole drive, expect almost no improvement from adding a cache as it would be nearly impossible to predict what blocks are needed next. If your accessing some files more than others, a cache may help a lot. I'm generally a fan of a sperate SSD only pool if you know some files are going to be access more often than others. Like a SSD pool for current projects, and a HDD pool for archive projects. But this can add complexity and depends on your exact workload.
I didn't test the journal device for this video(I already spend a long time setting up these arrays and rebuilding them). I might look into mdadm journal devices later if people are interested.
@@ElectronicsWizardry ZFS can have a special vdev on nvme, which helps a lot in certain cases. metadata is always stored on nvme (crazy fast directory index loading!) and you can choose (per dateset if you want) up to which block size shall be stored on nvme instead of HDD. attention: if blocksize==recsize, everything will be stored on nvme.
I really liked your video but ... What happens if the volume or array runs out of space. Can i just add another drive and keep going? Unraid can do that.
With parity RAID, most hardware raid cards support adding a array, MDADM has supported adding disks to arrays for a long time, and ZFS has added this feature recently. In all of these examples the drive has to be the same size as the existing drives(larger drives houdl work but the extra space won't be used)
@mimimmimmimim Unraid can add a volume of any size up to the size of the parity drive. Nice thing about Unraid. Not restricted to having all the drives the same size.
Yea unraid is super flexible with its storage and adding more later. Unfortunately each storage system has compromises and I think Unraid has limited speed compared to other solutions as it’s often limited to the speed of one disk.
MDADM doesn't have a journal... the filesystem you use on it has the journal. I'd suggest you do some tests using different filesystems..... BTRFS, EXT4, XFS.... there maybe others you want to look at... is JFS still kicking around? And as others have pointed out, you could move the journal to SSD's perhaps a mirrored pair. Also look at MDADM "chunk" size. Many years ago when i was playing with MDADM and XFS i had to do some calculations for what i would set for XFS "sunit" and "swidth" values. I expect BTRFS has something similar. (Sorry, can't remember the exact details of those calculations.)
I think I understood how MDADM does its journal, I thought(incorrectly it seems) that it always uses it like ZFS log, but it seems to only use the journal when a journal device is connected. I think I tried a few other filesystems and didn't see a performance difference. Since I was mostly trying to test RAID performance I stuck with XFS as I didn't see a big difference between filesystems and fio performance, and wanted to keep the tests rounds down(It took ~3 days for each RAID types to be tested as I had to wait for the initialization, then write a 15TB test file, then do a rebuild) I should check MDADM chunk sizes, that easily could have been the issue here.
@@ElectronicsWizardry Sounds like you need to acquire some smaller drives, 1tb perhaps. I know they aren't good for price per TB, but it would save a considerable about of time for tests like these. Even if you don't, a follow up video testing just XFS with a separate journal drive, and tweaks to the chunk size to get better performance out of MDADM would be a good topic.
I do have a pile of 1TB drives. I should have remembered to use them instead as the whole drive can be overwritten faster. It seems like looking into MDADM would make a good video and I'll work on that in the future, It will be a bit of time though as I have some other videos in the pipeline.
MDADM has a "bitmap" which allows it to only resync the recently changed data when a device fails and then comes back, but it's not a journal. But Device Mapper has dm-era module that does something like this, I think.
Vroc to my knowledge is for NVMe only drives on specific platforms. I'm not sure how it does performance wise, but I think it uses a bit of silicon on the CPU to help with putting the array together for booting from, but I think much of the calculations is still done on CPU cores with no dedicated cache. I might make a video about it if I can get my hands on the hardware needed for VROC.
The problem with hardware raid is when something goes bad, no tools ... RAID5 and 6 with mdadm are terrible, especially with fast storage. Some work is being done to fix that but when will that be included in the kernel is anyone's guess. For now disable bitmaps and try to use power of 2 number of data disk (4+2 for RAID6 for example). That should fix some of the issues you are seeing.
I should take another look at storage spaces. Its been a bit since I've done a video on storage spaces, and I think server 2025 changes some things. I decided to skip it here as I am more familiar with Linux and adding a second OS to testing adds a lot of variables.
I don't think its been announced what Debian 18's code name will be. The latest codename I think they have public is for 14 with Forky. I'm guessing they still have a lot of toy story characters to go through.
My hardware raid card works funky with my consumer grade motherboard. Add it's a 10 year old motherboard and you have all sorts of funny. But it works.
@@cheako91155 and BtrFS's raid56 implementation is broken so not just does zfs not suffer from that issue but also has snapshots - that's already a 2:0 against your nonesense reply as for sun: they were bought by oracle a long time ago and hence no longer exist
Just my 2 cents about Hardware RAID controllers with spinning rust. If you are doing anything live and you need to ensure that you have real world up time , the hardware raid controller is by far and hands down the way to go. They can be an absolute pain to set up and you do need to have a spare card on hand for emergencies that do come up.
as you are using a Perc card which is a rebranded OEM LSI MegaRAID 9361-8i, if the RAID controller fails, and you are using Linux you can use an HBA + MDADM to import it and run it like normal to recover any data you need till you get another RAID controller with the same firmware
haha I've actually been booting my Windows Server off a RAID 1, 2-drive array for like 9years. This is on a Dell R510 with a H700. I've never heard you can't boot from RAID, but then again I've also never tried using software RAID
I am running unraid with two zfs pools. I can get almost full Read and write speed out of my exos hdds. So Software raid isnt a bottlenecks now days if there is enough cpu power available
I stuck with HDDs here as there most common in home server and NAS use. SSDs change up a lot of the performance calculations as they are so much faster things like the CPU and bus speeds are much more likely to be the storage limit than the disks themselves.
There is so much to ZFS that I don't understand, that it seems too dangerous to me. I also don't like how they don't focus one bit on performance. You can now expand RAIDz, but doing so is nothing like expanding traditional RAID. Performance drops with each drive added. It is not a complete resilvering. Dumb.
Great video! My few cents on this topic 1) Hardware RAID with writeback cache and single card is pure evil (TL;DR just don't do that) - whatever redundancy you get from RAID geometry is swept away by reliance on a single piece of hardware. When that controller fails, your data is in an unknown state, because some writes might be in the cache and thus lost. Also, when that battery fails, performance drops off the cliff because you can no longer use writeback cache (if you value your data), at which point if you plan for _not_ having the cache, then why rely on that piece of hardware at all? There are ways to do this properly with multiple cards that provide redundancy and synchronize their cache, but that's usually vendor specific and I haven't had much experience dealing with them, those make sense mostly if the setup is large, like multiple SAS expanders with tens or hundreds of drives (and usually hosting a huge Oracle instance or something like that). 2) MDADM is great for OS partitions because of its simplicity, but also for performance-sensitive workloads like databases and is extremely flexible nowadays when it gets to expansion or even changing between RAID levels (if you are ready for the performance penalty when it's rebuilding). But it lacks the features that make stuff like ZFS great, like snapshots (yes, I know you can use DM for those, but has anybody actually tried using that productively?) 3) ZFS is what you use if you actually value your data because of all the features, like streaming replication, snapshots, checksums... but you lose a lot of performance because of CopyOnWrite nature of the filesystem. This is what I use as long as I can live with that performance penalty. This should just be default for everyone :) 4) BTRFS - just don't (unless you are an adrenaline junkie). Every single BTRFS filesystem with more than light desktop use I ever encountered broke sooner or later. Very common scenario is colleagues claming BTRFS works just fine for them and me discovering I/O errors in kernel.log that were simply overlooked or ignored and resulting damaged data that were never checked. 5) there are many seldom-used but very powerful options for gaining performance, for example putting your ext4 filesystem journal on a faster SSD or adding a NVDIMM SLOG device to your ZFS pool, it might be interesting to see a video on those :) Some more notes on your testing 1) FIO is complicated, IO engines are a mess. I spent good few months trying to get numbers that made sense and correlated to actual workload out of it. Sometimes you're not even testing the bottom line ("what if everything was a synchronous transaction") but just stress-testing weird fsync()+NCQ behaviour in your HBA's firmware. Not sure I have a solution for this, testing is hard, testing for "all workloads" is impossible and you can get very misleading numbers just because you used LSI HBA with unsupported drive, which breaks NCQ+FUA, which makes all testing worthless (ask how I know). Btw LSI HBA in IT mode is not as transparent as an ordinary SATA controller. 2) CPU usage ("load") might not seem that important when what you're storing are media files, but when you're running a database it translates into latency... except when it doesn't, which is what I suspect happened to you, because FIO wasn't waiting for writes to finish (so not blocked on I/O) which is what "load" actually measures. On the other hand, if you throw NVMe drives in there then that CPU overhead will start making a huge impact inside your database/application which can just throw away that thread immediately etc.. 3) HW RAID controllers are capable of rebuilding just the actual data in an array, but that requires a supported hardware stack (drives and their FW revisions) and functional TRIM/discard in your OS. 4) ZFS performance testing is usually non-reproducible with faster drives or when measuring IOPS for a couple of reasons like fragmentation, allocating unused space vs. rewriting data, meta(data/slab) caching, recordsize or volblocksize vs. actual workload and weird double caching behaviour on Linux. Just try creating a ZVOL (let's say 2 times larger than your testing writeset) and then run the same benchmark multiple times on it, you'll start seeing different number on every run, trending downwards :)
Thanks for the feedback. I noticed the card in IT mode seems to be realize it cache to help single disks. This affected my testing and trying to get good results, but in the real world this should only help(unless there is a bug with the cache, but I guess this could affect any disk controller, the cache on the card is error corrected) The more storage performance tests I do the mode complex it becomes and more variables I realize I have to control. I know these results are far from perfect and can't be extrapolate to every use case, but I hope these numbers are better than the often vague descriptions of performance I see online, like slow writes, or no practical CPU usage difference. I hope others can use these tests as a jumping off point to running more tests specific to their workloads. Load averages likely weren't the best for seeing the CPU usage. Looking back running a CPU benchmark at the same time might have been a better metric. It was pretty reproducible though in my testing.
@@ElectronicsWizardry IT mode shouldn't use cache at all. In the past I used IR mode with RAID0 of single drives to get controller-based writeback cache (backed up by the battery), but in IT mode this is not possible. Just beware that IT mode is not transparent at all. I am biased against LSI adapters based on my past experience, but they have sort of become the standard for home NAS/homelab setup. For small setups, I think it's better to get a cheap SATA controller. (And lots of AMD boards can switch U.2 ports to SATA mode drectly to the CPU which is awesome).
@@zviratko I might test it again, but it defiantly seems to used the RAM as a read cache on that card when set to IT mode. I should look into this more and might do some testing in the future Do you have a model of sata controller you suggest?
All the tests were done over a ~3m period, so the writes couldn't just be dumped to cache without testing the whole array performance. The cache helps with sustained write performance, so I'm far from jsut testing the cache..
9 วันที่ผ่านมา
ZFS also cares about your data by doing checksums. Hardware RAID cards lie a lot and will happily spew out any garbage they have written earlier.
I skipped BTRFS as its RAID 5 and 6 solution isn't listed as fully stable yet. I like how BTRFS RAID is very flexible with adding and removing drives and mixed drive size configs.
So hw hba cards are only used nowadys in servers in something like a multi cluster jbod atorage set up where you might want redundant data paths... Reason md raid was slow in your dual socket is probably due to the fact its old. Everything uses md raid in the server world now.
Hardware raid is really firmware raid, its software “burned” onto a chip. Does that sound like it’s easy to update? It’s software and it will need to be updated. Remember when firmware was a joke? Its software written to the “eeprom” so between software and hardware is firmware. LOL.
Be careful with the results of this video. The system he's using has a very weak and underpowered processor in terms of single core IPC and without any additional information his performance problems are most likely localized to his system configuration. I've been using mdadm for years on a variety of different configurations and the only caveats were slightly lower. Random read speeds and in some cases higher disc latency than hardware RAID arrays but other than that, as long as you have a decent CPU (a 10-year-old optron is hardly a good test candidate) It's a viable solution for Budget conscious individuals. Fun! Fact: Intel maintains the entire software stack for mdadm and it's implemented as part of their Intel VROC solution on high-end server grade motherboards. One other caveat is nearly all modern hardware RAID solutions do not scale with high-end solid state storage devices and other nvme drives. Using these type of cards will bottleneck pretty quickly and experience diminishing returns. I've worked with distributed storage systems for years and stopped using hardware raid arrays almost a decade ago.
That opteron server might not have been the choice here as it was probably getting to be too slow for a realistic test bench. I was trying to stick with a slow CPU and slow HDDs as that seems more common in NAS units and home servers. SSD scale differently in performance but I stuck with HDDs for this test. Unfortunately no single platform can be used to extrapolate results for every use case, but I try to get the best results I can with the hardware I have. Hopefully I can get something like a LSI 9560 and a few NVMe drives one day to see how a modern NVMe RAID card compares to software RAID on a high speed CPU. During my testing I didn't notice the CPU threads being maxed out and the disk usage was being maxed out.
Err... Windows have no problem booting from a raid like volume, and have been able to do so since Windows XP days... So that's just plain misinformation you're spreading there. It's a bit fiddly to set up, but it's very much possible.
How can Windows be setup to boot from a software RAID volume? I've check a lot of sources over the years and yet to see a good way to do it. Storage spaces isn't bootabe to my knowledge(I think it was setup in a bootable matter on a surface system, but I don't think that counts).
@@ElectronicsWizardry You're right that storage spaces isn't usable for booting yet. Windows does however have the older raid like tech that was introduced in XP named Dynamic Disks, which can be. It supports simple, spanned, striped, mirror and raid5 volumes, and most importantly here, Booting from it is supported. But, for all usages except booting from a mirror set of dynamic disks, it's also so old that it's deprecated, but not yet removed either. It's even still there in the current previews of Win12, and probably isn't going to actually removed until Storage spaces drives actually are bootable. As for how, just install on one drive. Once in windows, convert to dynamic, initialize another drive as dynamic (but no partitions), and extend the system drive over to that drive. It does however NOT work with dual booting, because it needs both the MBR and boot sector of the partitions to work. Normally you set grub to overwrite the MBR and chainload the bootsector version, but that does not work with dynamic disks.
Yea your right there are ways to do a mirrored boot, but compared to Linux the options are limited, its mirrored only, and uses a deprecated RAID method in Windows. I think its reasonble to say that a RAID card can a good option if you want to boot windows from a RAID array. Thanks for the reply
@@ElectronicsWizardry No no. You misunderstand. Only mirrored boot is still what MS considers current with this method. You CAN boot any dynamic disk type, including raid5 style, and the method as a whole, is not deprecated. It's only for OTHER USES that it's deprecated. As you, I'd still recommend a raid card, that's not the point. But recommendations are different from possible. My point is simply that you shouldn't claim it can't do it when it's very much possible and has been for almost a quarter of a century.
lvl1techs made a great video on this why you should not use this, This is not a viable solution due to it not detecting write errors and bitrot. The current hardware raid cards are not the raid cards from the past th-cam.com/video/l55GfAwa8RI/w-d-xo.html
I watched that video when it came out, and probably should have talked about this issue more in the video. While this is a potential issue, It generally seems to be pretty rare in practice due to the error correction on HDDs/SSDs. I have used many hardware raid cards with data checksumming in software and almost never get data corruption/bitrot. Zfs and other checksummed filesystems are nice and help to keep data from changing, and notifying the user upon issue. My general experience is checksum errors on HDDs is extremely rare on a drive that doesn't have other issues. Keeping bitrot away is one reason why I generally stick with ZFS as my default storage solution, and try to use ECC if possible.
@@ElectronicsWizardry ECC will not do anything against Bitrot, Bitrot refers to the gradual corruption of data stored on disks over time, often due to magnetic or physical degradation. ECC memory primarily protects against bit flips in system memory, not storage, so it does not prevent bitrot. ECC memory for ZFS is recommended for not writing any bit flips to disk, this is even a more extreme measure since bit flips in memory occur even less then writing errors so to on 1 hand being very lax about data integrity using hardware raid cards but then suggesting ECC memory for ZFS on an even less likely issue is very strange. HDD's dont have any error correction, they just write the data it gets, it does not have any way to even know what data would be the correct data so how can you even start to do error correction on it, the filesystem is responsible for that by doing checksums. (yes a HDD has ECC but this is for reading data from the instability of reading such tiny magnetic fluxes) You should rewatch the video, specially at 6min timestamp, there are no cards that do any checksums anymore " I have used many hardware raid cards with data checksumming in software and almost never get data corruption/bitrot" this statement is 1 to 1 equivalent to your grandma saying "I got the age of 90 with smoking a pack of cigrates a day" Good on you, but those empirical statements are kinda useless and specially as "informing" people it only hurts, since i bet you, atleast one person is going to buy a raid card in 2025 because of this video
@@ElectronicsWizardry After Advanced Format drives (512e included) came out, all the different DIF formatted drives became much less attractive because the how powerful the new ECC algorithm AF drives uses (LDPC on 100 bytes per 4k physical sector). This is why most modern hardware raid controllers don't use DIF formats; because it's been made redundant by the checksumming that is running on the drives themselves.
Yea ecc doesn’t help with bitrot but you want all memory and interfaces in the data storage system to be error correcting to prevent corruption. Ecc and disk checksuming help with different parts of the data storage pipeline. Hdds do have error correcting that is hidden from the user available space. That’s how drives know if there reading the data correctly or not. Also interfaces like sata and sas have error checking as well. The assumption raid cards make is the interfaces used by drives and the data on the disks is checksummed so the drives only return correct data or none at all. There are edge cases where this can occur but it’s rare in my experience and I’ve used a lot of drive so if this was common I’d guess I would have seen it. Also a huge amount of business servers are running of these raid cards. If there was a significant data bitrot risk I’d guess this would have been changed a while ago.
@@ElectronicsWizardry I definitely agree that it would be ideal to have ECC throughout the entire pipeline of subsystems (all the SerDes structures and all memory levels) the data goes through, but adding complexity to a design can also introduce new sources of errors; there is probably some ideal balance between simplicity and error checking, and I'd be willing to bet we're pretty close to it in most systems. I suppose bitrot is kind of an all-encompassing term, but the ECC that runs on hdds is being used to checksum every sector that is read, so it would protect against some kind of surface defect or magnetic bit problem below LDPC threshold on the hdd causing bitrot, but it'd be narrowed to correcting only those sources of bitrot and not some kind of wider memory or cabling issue. This is all mostly transparent to the user like you said, you'd have to do into SMART to look for reallocated sectors to discover this was happening.
hardware raid is dead! ceph/zfs leads everything. i went away from both software and or hardware raid now for about 4 years. im happy with my zfs in my homelab!
Yea I'm a big fan of ZFS/Ceph for much of my use I go with ZFS. For many of my uses squeezing a bit more performance out of a array is less important than features like ZFS send/receive and I am very used to how to manage ZFS. Also next week's video is gonna be using ceph in a small cluster so stay tuned.
@ElectronicsWizardry The redundancy and ease of management that ZFS offers are truly unmatched compared to traditional RAID. If you're replicating something like RAID0 or RAID5 and want to avoid sacrificing performance by skipping a ZRAID setup, there’s absolutely no reason to stick with RAID. With ZFS, you get almost the same speed but with far better security! For me, proper error correction is non-negotiable! I refuse to rely on systems that don’t (proper) offer it. While I do like BTRFS, it’s still a no-go because I prioritize both security and performance. Unfortunately, BTRFS RAID setups, as I would need them, are still too unstable. Honestly, there’s anyway no reason to move away from ZFS, even if BTRFS eventually becomes stable for certain RAID configurations. Btw, i like your recent style change with this fancy beard! :-D
I've exported imported zfs pool between several oses, HBAs, mainboards over the year even one on an rpi4 and even on Windows. Nothing beats the flexibility of ZFS.
By the way I just don't get it why people are stuck using raid levels. Raid 5, 6, etc, adrenaline junk much? Especially with the ability to enlarge pools both vertically and horizontally, ZFS mirror stripes (kinda like 1+0 but exaggerated) are both flexible and performance wise preferable. Especially when it comes to resilvering after disk changes, mirror vdevs do not put stress on the cpu and the rest of the drives on other vdevs. They do not require massive amounts of hash calculations like raidz types do (as well as raid 5, 6, 166 etc )... 3-way mirrors combined into a 9 disk pool for example is a beast. Or you can be less greedy and prefer 3 2-way mirrors striped together.
Additionally zfs supports several kinds of cache additions to a pool. Using an nvme for read cache, another for writes, another for metadata and more can be done in order to drastically enhance performance.
One more thing, mirrors do not hit cpu as raidz modes do. Even mirrors stripes are this way.. Still, since every data block in zfs, has their checksums calculated and stored, there's still more cpu overhead with respect to others.
@@mimimmimmimimas for why people stuck with old raid levels - because the industry stuck with them as well - and as they just define basic priciples they're still valid today: raid0 - striping raid1 - mirroring raid5 - single-parity raid6 - dual-parity also: for zfs one has to know that a pool of 2 or more vdevs is an implicit stripe - and hence a pool of N mirrors still is the same as raid1+0 in fact every zpool is a raidN+0 - so if one vdev fails the entire pool fails that said any raid1+0 CAN take up to half of its drives fail as long as only one drive per mirror fails but a raid1+0 will also fail from one double failure when both drives of the same mirror fail so depending on your usecase a raid6 might be the better option than 1+0 - because in a 4-drive array it doesn't matter which two drives fail
I would like to see these numbers re-run on a significantly newer more powerful machine. I'd also like to see ZFS utilize QAT and other hardware accelerators, GPU compute, etc. because of all the data integrity guarantees it provides, it's a little painful to see it come in last place for performance in these tests.
Yea that can be annoying on a lot of cards. Dealing with licensing is a pain in IT. I often try to stick with free open source products if I can even if I have the money available just to save me the hassle of licensing.
This is a great video! Thank you for all your work on this.
Thank you. I had planned to do something simlar in the future, but this video saved me time. Love your videos, keep it up.
Glad you like my videos and I was able to help answer some of your questions. Thanks for the support!
Great video by the way. Every second was fun to watch. Thanks so much for making me feel at home 😊
My man coming through with exactly the info I seek right now. Thank you sir!
seriously.
Came here to say this, our man is a mind reader.
Another reason I use ZFS is the utilities that it provides, such as snapshots and send and receive. They make it easy to backup and restore. Even with hardware raid solutions, I would still put ZFS on top of the logical volume.
Great video bro! Thanks as always for doing the tests and sharing the results!
I've worked with storage in a professional and hobby capacity for years. This guy knows his stuff
oh yeah, this is SUCH a fantastic channel
Great to see you on my feed!
Great discussion topic. Thanks for a great video.
A very important information needs to be added... These cards are designed for servers, where a lot of air is moved from the front to the back by many fans. These RAID cards get hot and they die if not appropriately cooled. I learned that the hard way! Fortunately and as you told, buying a new card of the same type (LSI 8260-8i) brought all data back. Would be nice to see a test of different 9xxx controllers of different ages and revision to get a feeling about how hot they get.
Looking at the different generations of LSI/Avago/Broadcom RAID cards seems fun. I'll start collecting them if I can get my hands on them. I've always been curious how the NVMe ones work too.
I added a fan to cool my LSI card after one burn out. I also replaced the thermal paste every few years. And yes, I did the same when the card failed. Just replacing the card (even with a newer model) brought the RAID array back.
These tests are fascinating and such a good tool when considering the different RAID options in a system. The analysis is very much appreciated!
This is a fantastic video I'm still learning ZFS I'm a big fan of hardware raid because it just works . Most people that hate on hardware raid have never worked in a professional environment. I run my own personal servers on nothing but hardware raid I source them used from eBay. I always grab a backup controller in the event of an emergency we are talking $50.00 to $80.00 for the used Dell or Lenovo branded controllers.
this was very detailed and informative. thanks for your work
Glad you liked the video and found it useful.
One issue I'm running into in the wild as MSP is clients wanting gig or multi-gig networking, but their drive arrays just aren't fast enough to give them that performance for internal resources. Higher internet bandwidth than local is certainly not the example case I was expecting, but I've been trying to push clients towards better drive performance choices for a while now.
Thanks for helping me to be more prepared! The only luck you get is the luck you build in preparation.
I use TrueNAS Scale on a old rack server I bought cheap on eBay, so ZFS for me. My server is a Dell PE 210 with one processor and only enough room for two drives, which is fine for my needs. The server is quiet enough to live in the same room with it, occasionally the fans ramp up but not for long. It is fast enough to copy or download large files, I mean it might take a minute to get it done sometimes. My network is slow anyways so I'm not sure where the bottleneck is. Loading files like RAW camera files is plenty quick, and I have no problem streaming video to my TV. I do appreciate your work here, I did not know much about hardware raid, so I was mainly watching for info about that. Thanks.
Thank you for this valuable insight.
You can put the journal for mdadm on a faster drive (nvme) --write-journal option as well
That requires a separate drive right? I might look into that in the future, but when planning this video I already was doing a lot of testing and decided to skip the mdadm journal drive testing. I might look into this for a future video though.
@@ElectronicsWizardry I'd love to see some results in the case of a ssd/nvme journal drive!
@@ElectronicsWizardry Definitely a separate drive.
Its in the official manpages but often not really known to alot of people. Thanks for doing the raid testing as well!
It's called a bitmap, not a journal.
@@ElectronicsWizardry that would be interesting, too, esp in the context you’ve established already! Thanks for the info, good layout & presentation
Most newer raid cards use nand and super caps (when power is lost it dumps the raid ram cache into the nand on the raid card)
also some support swappabble cache so if the raid card fails while it was online you can move the cache to the new raid card
This is assuming you have enabling the write cache mode wirch uses the cache on the raid card
Yea I was debating on going into more detail about how supercaps have gotten much more common recently, but decided not include that to try to keep the video shorter.
Do you know the model of one of those cards with remove cache? I've seen some that use a little RAM like board for swapping out the cache, but it seemed to mostly be for upgrading to more cache, not keeping cache form a failed card.
@ElectronicsWizardry need to specifically look for the ones that have NAND on them , the configuration of the hardware raid array (should be stored on it) and uncommited data is stored on it as well so when the card is replaced and the nand module is plugged into new card it should just work or it ask to restore previous/unknown configuration
only thing you must do is make sure both cards are using same firmware (update both cards once and don't update them anymore, this is so you know both cards have same firmware)
The issue with the hardware raid or HBA card not working on consumer hardware will be due to the smbus pins (buy cards that don't use smbus pins or use caption tape over the pins, flashing the firmware for IT mode doesn't disable the smbus as it's an optional hardware feature that can be implemented by the raid/hbsla card, typically dell and hp raid/HBA cards will have the smbus wired on the card for BMC/light out/idrac use)
the issue stems from that smbus is supported by UEFI on consumer boards but the UEFI module is usually missing so the system just hangs on boot or you have random issues with the computer (or it works because the manufacturer of the motherboard actually didn't wire the smbus pins on the pci-e slot it self so the smbus issue doesn't happen).
@ElectronicsWizardry (might have to respond to you on discord or email as TH-cam might automod/delete my post) which it looks like it has delete it
@leexgx I should put more of my socials on TH-cam but my email is probably the best way to message me.
@@ElectronicsWizardry Hmm a raid card with a built in nvme socket for just using for cache/recovery would be nice to see. I wonder when/if that's available now? I've got an old rocketraid 2740 (pci-e v2) x16 card that I really want to upgrade but every time I do a search for "16 port raid pci-e 3" or 4 I don't seem to see many and most are $500 or something crazy.
Thanks for sharing that! I’m also curious about what environmental setup is best for all NVMe SSDs in a RAID. I mean what raid controller or soft-raid should be used? And what file system is the best fit for SSDs? Would you mind making another video to talk about this? Really appreciate it! 👍
There was no mention about BTRFS (as suggested in the title and description) and you missed the most important feature of ZFS: Checksums! A performance comparison is nice but if you care about your data, there is no way around checksums
OOps, edited the title with MDADM instead.
As far as checksums, that is one big advantage of ZFS + BTRFS + others with that support. I will mention that HDDs have internal checksums to prevent data from being read incorrectly, and typically results in a low enough error rate.
@@ElectronicsWizardrynot even close to how zfs approaches though. Like Dr. House always say. They always lie...
So helpful! Thanks a lot!
My question is, what happens, if you introduce a one bit error on a drive attacjed to the hatdware raid controller? You have to put an error on every drive on different locations, to be sure, that the controller does not always overwrite the wrong bit on the parity drive for example. ZFS can handle these errors with its checksum.
When I tested Qnap QTS it could not do that and the NAS returned faulty data.
I think all the hardware RAID cards I know of will assume the data from the drives is correct, so you would see a error in this instance. Since drives have checksumming of the data along with the SATA/SAS interface the assumption is the risk of reading incorrect data is low.
I should do a video looking into this in the future.
@@ElectronicsWizardry there are RAID controllers + SAS drives that have their own checksumming, but it's not something you'll be realistically able to setup in a home lab
I already had data corruption problems with several files being unreadable. I did not recognize it for a longer time period. So my backups were also corrupted. I still do not know, what the reason was. Bad Sata cable, faulty driver, faulty CPU or RAM? As I used hardware raid at that time, it might also have been the raid controller. I value zip files since that problem, because they do a crc32 checksum, so you know immediately, which data is correct.
The main reason why you dont use hardware raid is because of reliability especially with modern filesystems which are designed to work with native raw drives. You cannot depend on hardware raid to communicate properly especially on consumer hardware. Unless the raid card is certified to use with the filesystem like zfs and btrfs stay far away or use it in IT MODE. With hardware raid cards you also have to worry about drivers and software support for your OS.
No raid card is compatible with zfs, as far as I am aware. It needs raw access to the disks, and a raid card will not allow that. Not sure about btrfs. A raid card without the raid ability (just connects disks) is just called an hba.
There are indeed some raid controllers which can be put in HBA mode and therefore pass raw disks to the system.
@olefjord85 that is specifically called IT mode or "hba" mode. Any means of using a raid card in the mode it was designed for (raid mode) means trouble with zfs
@@industrial-wave also JBOD mode
@@SakuraChan00 that's the same thing as hba/it mode, just another different name for it
It's really not easy to compare because there's so many ways to configure a ZFS volume...
The best way is to use XFS.
@@JodyBruchon XFS has redundancy options?
@@frederichardy8844 That's the point of md, duh.
@ so back to compare ZFS to MD and a lot more options of configuration for ZFS than MD. So not easy to compare because depending on the files (size, content) and the usage (sequential read or not, concurrent access, read or write, only access to recent files or totally random) the best is not obvious and obviously not XFS in every case.
ZFS has countless solutions / topologies on how to utilise your drives into a pool.
Very good overview of the various implementations.
I would like to stress that the comments at 1:58 are what really makes hardware RAID unattractive nowadays: hardware RAID controllers have gotten expensive. In addition to the make and model being correct to move data in case of hardware failure, firmware versions are also important here. Back in my data center support days, we always had a cold spare of it laying around and never got updated firmware until it was ready to replace the failed unit. There have been times were down grading firmware was also the path of least resistance to bring back a storage system. Another interesting feature of some select RAID cards is that they off their own out-of-band management network ports with independent PoE power. This permits setting up and accessing the RAID controller even if the host system is offline. The write cache can be recovered directly from the device, archived safely and restored when host functionality returns in a recovery mode. Automations can be setup to copy that write cache on detection of host issues to quickly protect that cached data externally. Lastly a huge feature for enterprises is hardware RAID card support for virtualization. This permits a thinner hypervisor by not needing to handle the underlaying software storage system for guest machines. All great enterprise class features that are some of the reasoning why RAID controllers are so expensive. (That and Broadcom's unfettered greed.)
ZFS has something similar to journaling which can be separated from the main disk system called the ZFS Intent Log (ZIL). Similarly there are options for separate drives to act as read and write caches. Leveraging these features can further accelerate ZFS pools if used in conjunction with high IOPs drives (Intel Optane is excellent for these tasks). Redundancy support for the ZIL, read cache and write cache is fully supported. CPU utilization here is presumably higher but I haven't explicitly tested it. I have seen the performance results and they do speed things up. Common setup is using spinning disks for the bulk storage with these extras as fast NVMe SSD. Costs of using these ZFS features used to be quite high and comparable to the extra cost of a RAID card but storage prices of NVMe drives has dropped significantly over the past few years changing the price/performance dynamic.
5:27 Both hardware RAID and ZFS can be setup with external monitoring solutions like Zabbix for monitoring and alerts as well. In the enterprise world these are preferred as they're just the storage aspect for a more centralized monitoring system. Think watching CPU temperatures, fan speeds and the like. They don't alter the disks like the vendor supplied utilities or software tools, but they do the critical work of letting admins know if anything has gone wrong.
One last thing with ZFS is that it can leverage some newer CPU features to accelerate the parity computations that your older Opteron may not support. With ZFS's portability, you could take that disk array and move it to a newer faster platform and speed things up that way. I would also look into various EFI setting for legacy BIOS support on that i7 12900K board. Ditto for SR-IOV which that card should also support.
Cheaper 10-15yr secondhand (thirdhand?) PCIe-2.0 cards top out around the numbers listed here. For example the Areca 1880 series are incredibly rock-solid cards, and they work just as well without a battery...just make sure the server is on a UPS and shut it down before the UPS dies ;-)
BTRFS raid6 user here 8x8tb. using raid1c4 for metadata and raid6 for data to mitigate the write hole bugs until the raid stripe tree is fully released. performance is ok, and requires very little ram, (2gb-4gb of ram is fine).
I've been using all three options for a little over 10 years now... in up to small enterprise systems, with roughly three generations old hardware.
And I'm still a huge fan of hardware RAID. But... not all hardware RAID is created equal. I've used PERC, LSI, Adapted and Areca hardware RAID solutions... and while Areca has since changed ownership, their hardware has hands down been my favorite, for both ease of use and raw performance.
I personally think hardware raid options have gotten a bad rep, because it's not a system that most people have to deal with on a day to day basis... and recovery can feel archaic and intimidating.
But the shear portability of a nearly platform agnostic hardware raid array, from one system to another is very hard to beat.
My experience with MDADM RAID has been a bit different than what was presented here when compared to ZFS and hardware RAID. I've found MDADM RAID-5 to be nearly as fast as hardware RAID, on similar hardware, even through a SATA port expander.
ZFS, to me, even with a stout CPU, 96gb of ram and SSD cache... still didn't perform as well as the RAID cards I was used to... granted I realize that none of my testing was cutting edge, but this is also the hardware that was available in our budget.
Software RAID is cool, and ZFS data integrity is second to none. But the hardware overhead of software RAID, in my experience, will rarely perform as well as purpose built hardware.
At least that's my two cents.
I appreciate the videos, and look forward to more!
I went ZFS route back when I've setup my proxmox multi vms combo server back in 2021. So far (knock on wood) after 2 separate drives failures no data was lost. Before I was using either proprietary NAS solutions (qnap) or builtin motherboard raid configurations and this always ended up with partial or complete data loss. Thankfully after my first disk failure I always kept some sort of external backup so even though qnap and raid solutions failed me I had some way to restore my data. I'm not saying ZFS is bulletproof (look at LTT situation from couple years prior), but if you do regular pool scrubs and extended smartctl tests - so basically you won't let your ZFS pool to "rot" then I'm pretty sure ZFS is the best their is (so far).
I use a second hand 8 port hardware raid card with the raid bios removed so just passes the disks .
I've been using ZFS for the last few year so almost maxed out my 5 drives , so can't wait for single disk expansion which should be here soon.
Been using hardware RAID 6 and so far I'm pretty happy with it. Initially started with RAID 5 for the speed because I planned to also use it for games but a faulty 3.3v adapter and getting a good deal on 2 4TB PM1733's caused me to switch to RAID 6 for peace of mind. Having the ability to up and move to Linux whenever it gets good without any hassle was a huge reason as well.
The only solutions I found for cross OS's was a VM which was out of the question because of gaming and an entire other PC which was out of the question because lol no.
You should _always_ use a form of ssd as a log device for a raidz1 or raidz2 if you want a decent performance. An alternatve is to force the pool to be asyncrons, but then you can lose up to 5 seconds of data. One of the best log devices you can use are smallish optane drives, just avoid the 16GB ones since their sustained write perfomance is to low.
Excellent suggestion.
Though when one starts the sentence with the word performance, why use raidz instead of mirror stripes...
MD raid you seem to have journal enabled (possibly bit mapping enabled as well as that avoids you having to do a full rebuild when a drive is removed and re plugged back in or a unclean shutdown)
Unsure if it's a feature of mdadm but on Synology it can skip Unallocated space on rebuild
If you can’t get into the pre-boot menu on a consumer board it might be because you didn’t go in and enable that function. You might also have to disable fast boot. Specifically there is an option though that enables the pre-boot screen to show.
It's called Option ROM, and there are Legacy and EFI ROMs. Typically older cards don't have EFI compatibility, which is why they won't show up at boot.
Really interesting video, thank you!
Glad you enjoyed the video!
I would like to see object storage like MinIO tested
Very pertinent. I switched from a 4th-gen Celeron-powered build with MDADM (and an extra PCI SATA card) with 6 disks, to a PowerEdge R510 with an H700 Perc card. MDADM was especially slow with writes, much slower than the Perc. However, Raid-6 performance isn't impressive either.
Performance is important, of course, but for some of us the power usage might be another important factor to consider.
Some time ago I was using HP P410 RAID controller and it was increasing my server's idle power consumption by 20W. That's why I decided to switch to software RAID based on ZFS.
that thing is an abomination.
@@chaoticsystem2211 Why, im still using card and integrated with raid 6 configurations. If guy cant effort better, you can have dl380 g7. It works with ssd drives for at least 7 years now.
That hardware RAID card is freaking old. It's a really bad comparison for 2025 - The difference between PERC 11 and 12 is huge in itself. Should do this again with a H965i to show what 2025 hardware can do.
Yeah! I would never use older than H730 cards.
Its an OK comparison since ZFS and MDADM are running on an ancient platform. Also because it can be purchased for 20$. Makes it easier to compare since ZFS is free.
Surprising results! I was of the belief that software raid is just as fast as raid. I was wrong!
The opteron server he was running these tests on is very slow. Modern cpu's are so fast now that software raid would probably show much lower utilization. The comparison with old hardware raid cards is tricky because they likely won't run on modern uefi motherboards, especially consumer ones. He kind of said this at the beginning of the video.
Have an Adaptec ASR-8805 with BBU in a consumer ASUS motherboard in UEFI mode, running for several years with Proxmox, no problem. Bought 2 of these controllers they were so cheap in case of issue.
Be careful with the results of this video. The system he's using has a very weak and underpowered processor in terms of single core IPC and without any additional information his performance problems are most likely localized to his system configuration. I've been using mdadm for years on a variety of different configurations and the only caveats were slightly lower. Random read speeds and in some cases higher disc latency than hardware RAID arrays but other than that, as long as you have a decent CPU (a 10-year-old optron is hardly a good test candidate) It's a viable solution for Budget conscious individuals. Fun! Fact: Intel maintains the entire software stack for mdadm and it's implemented as part of their Intel VROC solution on high-end server grade motherboards.
Just switch from HW Raid to ZFS runs on SAS3008 HBA,
I used to had weird issue with HW raid card for about to 2 years. The kernel driver just failed for some reason,
I try to fixed it, spend hours surfing on forums , played with BIOS(UEFI) setting , the issue were gone for mouth, But it went back recently.
Then i start to thinking i might have faulty HW raid controller, but i'm still not sure lol, So i get a HBA setup ZFS, I'm so impress the functionality from ZFS.
One of the best decision I've made.
I don't know if it was posted already or not, but if you are able to set the "Storage Oprom" to legacy on consumer boards, that will allow you to use the built-in managers for the different raid cards. But given the move to uefi on everythnig, its a dying thing to see legacy features on newer mainboards.
MDADM and ZFS have the advantage that if the hardware dies, your data is easily recoverable - a new Linux PC, USB adapters and such can re-assemble and mount the data quickly and easily. RAID controllers can be fussy often needing the same firmware version so you need to buy 2 cards and keep them in sync so hopefully there's a chance you can take one set of drives and put it with the other controller. MDADM and ZFS have the ability to auto-probe and assemble drives. It's all about what happens when failures occur, and while MDADM and ZFS performance doesn't match hardware, knowing that all I need is a Linux live boot image, a spare PC and some USB adapters (if necessary) means I have easy access to the data should the Linux server decide to fail.
2:04 Wow! a PCI-Slot! haven't seen one of those in a while
Your results agree with my experience. HW Raid, for small and medium systems, is not needed. I still see it for large enterprise SAN nodes, but not in this classic form. And, at least for desktop type machines, the move to solid-state storage has changed the performance equations yet again. But assuming we're talking bulk storage, AKA spinning rust, I would use ZFS over just about any other choice, especially in a NAS application.
Yea SSDs change the performance calculation a good amount. With HDDs its much more common to be IO limited than with SSDs. I choose HDDs for testing here as there much more common in home server and NAS use, and since HDDs do much worse in some workloads like random IO I thought it would be best to test with random IO. With how well this video is doing, I might look at SSD arrays, and try to get one of the NVMe raid cards to see how they work.
@@ElectronicsWizardry Yeah, that might be fun. One of my servers has an LSI card that does NVMe. No RAID in hardware for that, but the Kioxia U.2 drives I have it hooked to give insane I/O throughput. Perhaps you could do a ZFS comparison with using the SSD's in conjunction with the HDD's, either as special devices, or just cache. (And why cache drives aren't what most people think when it comes to ZFS)
Yeah, ZFS (even stripe) with NVMes and 4 fio threads with fsync=0 just max out the CPUs, or for practical case couple of VM guests maxing out their io on same zfs pool - same results on host CPU. For HDDs would choose zfs any day
Yes, solid-sate storage has completely changed the game for where you would need to use RAID.
As for a NAS..... ZFS is fast becoming the filesystem to use.
But nobody talks about SNAPRAID. I've been using that to store my media files for years. Never had any issues with it. And best of all, it only spins up the drive where the data is stored, rather than every drive in the array.
@Andy-fd5fg yea snapraid is flexible and does well in home media server like environments. It struggles with lots of changing files and only operates at the speed of a single disk. Unfortunately there is no perfect storage solution so it’s a pick your compromises when setting up raid or raid like solution.
The reason I use ZFS in every computer I own is because I can have a Hard Drive for bulk storage, NVME for Special Device and ZIL and RAM for fast cache.
One of the main reasons not to use hardware raid is that you might end up dependant on that specific card. If it fails and a replacement (likely second user) card cannot be found, then the data on the drives could be inaccessible (at least without having to pay for expensive data recovery services). Software RAID gives you the hardware independence needed for arrays that might still be in use in a decade’s time.
THAT!
another great vid
Thanks! Glad you enjoyed the video.
first raid card was 29160 purren 10 thousand rpm, drives were crazy expensive back in day, used in DAW station, what the average drive & lappy can do today is amazing, at low cost. awesum vid - dank yoo
HW RAID is only good if I have another card on hand when one fails. And even then, can they be easily swapped in?
Typically a replacement RAID card will detect a array and import it. Generally if its from the same manufacture with the same model or newer you can import the array. But the hardware requirements are much stricter for importing the array than software RAID solutions.
I generally like to rely on backups in case something goes wrong with the whole array, but having ease of recovery is still a feature if things go wrong.
@@ElectronicsWizardry What if data was in-flight when HW RAID card failed? Would entire array become inconsistent?
Isn't it the same concern as write hole problem?
Thanks for the video btw, it cleared up some concepts for me
@@keonix506 Unfortunately I don't have experience with a failing RAID card, and there isn't a easy test this I can think of as I don't want to physically break a RAID card. My gut is that a RAID card failure mid operation would cause data in flight to be lost, but likely keep the array still recoverable on a different card or with data recovery utilities. From what I've seen design wise, I think the assumption is that RAID cards don't fail that often so there not designed to be easily replaced hot like RAID cards, but unfortunately I've seen a good amount of RAID cards fail in my time(I don't recall this happening during operation though)
Thanks Brandon.
From what I understand, a hardware RAID generally doesn't allow you to use for example an NVME SSD as a cache. So wouldn't a software raid with such a cache generally surpass performance of hardware raids? Especially for random reads/writes?
I think a few Hardware raid cards supported a SSD cache, but this feature has since been removed in newer product lines to my knowledge. You can still add a cache in software with something like bcache in linux.
The annoying part of a cache is they can help a lot in some workloads, but almost none in others. If your doing random IO across the whole drive, expect almost no improvement from adding a cache as it would be nearly impossible to predict what blocks are needed next. If your accessing some files more than others, a cache may help a lot.
I'm generally a fan of a sperate SSD only pool if you know some files are going to be access more often than others. Like a SSD pool for current projects, and a HDD pool for archive projects. But this can add complexity and depends on your exact workload.
How does HW raid card stand against MDADM with dedicated NVME (or ramdisk) journal and bitmap device?
I didn't test the journal device for this video(I already spend a long time setting up these arrays and rebuilding them). I might look into mdadm journal devices later if people are interested.
@@ElectronicsWizardry ZFS can have a special vdev on nvme, which helps a lot in certain cases. metadata is always stored on nvme (crazy fast directory index loading!) and you can choose (per dateset if you want) up to which block size shall be stored on nvme instead of HDD. attention: if blocksize==recsize, everything will be stored on nvme.
I have one suggestion for you about this video. Add time stamps.
Are we even have some decent hardware raids in this times?
I really liked your video but ... What happens if the volume or array runs out of space. Can i just add another drive and keep going? Unraid can do that.
With parity RAID, most hardware raid cards support adding a array, MDADM has supported adding disks to arrays for a long time, and ZFS has added this feature recently. In all of these examples the drive has to be the same size as the existing drives(larger drives houdl work but the extra space won't be used)
Zfs can do that
@mimimmimmimim Unraid can add a volume of any size up to the size of the parity drive. Nice thing about Unraid. Not restricted to having all the drives the same size.
Yea unraid is super flexible with its storage and adding more later. Unfortunately each storage system has compromises and I think Unraid has limited speed compared to other solutions as it’s often limited to the speed of one disk.
MDADM doesn't have a journal... the filesystem you use on it has the journal.
I'd suggest you do some tests using different filesystems..... BTRFS, EXT4, XFS.... there maybe others you want to look at... is JFS still kicking around?
And as others have pointed out, you could move the journal to SSD's perhaps a mirrored pair.
Also look at MDADM "chunk" size. Many years ago when i was playing with MDADM and XFS i had to do some calculations for what i would set for XFS "sunit" and "swidth" values.
I expect BTRFS has something similar.
(Sorry, can't remember the exact details of those calculations.)
I think I understood how MDADM does its journal, I thought(incorrectly it seems) that it always uses it like ZFS log, but it seems to only use the journal when a journal device is connected.
I think I tried a few other filesystems and didn't see a performance difference. Since I was mostly trying to test RAID performance I stuck with XFS as I didn't see a big difference between filesystems and fio performance, and wanted to keep the tests rounds down(It took ~3 days for each RAID types to be tested as I had to wait for the initialization, then write a 15TB test file, then do a rebuild)
I should check MDADM chunk sizes, that easily could have been the issue here.
@@ElectronicsWizardry Sounds like you need to acquire some smaller drives, 1tb perhaps.
I know they aren't good for price per TB, but it would save a considerable about of time for tests like these.
Even if you don't, a follow up video testing just XFS with a separate journal drive, and tweaks to the chunk size to get better performance out of MDADM would be a good topic.
I do have a pile of 1TB drives. I should have remembered to use them instead as the whole drive can be overwritten faster.
It seems like looking into MDADM would make a good video and I'll work on that in the future, It will be a bit of time though as I have some other videos in the pipeline.
@@ElectronicsWizardry Make it whenever you can..... until then we will all look forward to you other videos.
MDADM has a "bitmap" which allows it to only resync the recently changed data when a device fails and then comes back, but it's not a journal. But Device Mapper has dm-era module that does something like this, I think.
How about intel vroc ?
Isn't that only for nvme? I'd say SSD performance in a RAID is high enough, even in a pure software solution.
Vroc to my knowledge is for NVMe only drives on specific platforms. I'm not sure how it does performance wise, but I think it uses a bit of silicon on the CPU to help with putting the array together for booting from, but I think much of the calculations is still done on CPU cores with no dedicated cache. I might make a video about it if I can get my hands on the hardware needed for VROC.
The problem with hardware raid is when something goes bad, no tools ... RAID5 and 6 with mdadm are terrible, especially with fast storage. Some work is being done to fix that but when will that be included in the kernel is anyone's guess. For now disable bitmaps and try to use power of 2 number of data disk (4+2 for RAID6 for example). That should fix some of the issues you are seeing.
It would be interesting to see how Microsoft Storage Spaces performs in this test 🤔
I should take another look at storage spaces. Its been a bit since I've done a video on storage spaces, and I think server 2025 changes some things. I decided to skip it here as I am more familiar with Linux and adding a second OS to testing adds a lot of variables.
Been using it for years on a workstation as a extra backup point. No issues. Works great.
Does anybody knows which codename Debian 18 will have and what codename will be chosen when they are through all the Toy Story characters? 🤪🤔
I don't think its been announced what Debian 18's code name will be. The latest codename I think they have public is for 14 with Forky. I'm guessing they still have a lot of toy story characters to go through.
My hardware raid card works funky with my consumer grade motherboard. Add it's a 10 year old motherboard and you have all sorts of funny. But it works.
I'm using btrfs, even though I'm using the experimental raid5 feature... What really drives me to use btrfs is snapshots, without the history of zfs.
Wait... ZFS has snapshots as well. I'm confused about what throws you off about the history of ZFS. Please eli5.
@@RedDime10 Sun is evil. It's older.
@@cheako91155 and BtrFS's raid56 implementation is broken
so not just does zfs not suffer from that issue but also has snapshots - that's already a 2:0 against your nonesense reply
as for sun: they were bought by oracle a long time ago and hence no longer exist
Just my 2 cents about Hardware RAID controllers with spinning rust.
If you are doing anything live and you need to ensure that you have real world up time , the hardware raid controller is by far and hands down the way to go.
They can be an absolute pain to set up and you do need to have a spare card on hand for emergencies that do come up.
Look at the NAS Minisforums has coming out! Its legit.
as you are using a Perc card which is a rebranded OEM LSI MegaRAID 9361-8i, if the RAID controller fails, and you are using Linux you can use an HBA + MDADM to import it and run it like normal to recover any data you need till you get another RAID controller with the same firmware
haha I've actually been booting my Windows Server off a RAID 1, 2-drive array for like 9years. This is on a Dell R510 with a H700. I've never heard you can't boot from RAID, but then again I've also never tried using software RAID
I am running unraid with two zfs pools. I can get almost full Read and write speed out of my exos hdds. So Software raid isnt a bottlenecks now days if there is enough cpu power available
But there is a word about fault of raid card. You will need to find exactly the same card with same firmware to restore you system
Linux md + XFS is the only way to go. All other solutions are inferior. ZFS sucks and is slow.
Bro, where are the SSDs?
I stuck with HDDs here as there most common in home server and NAS use. SSDs change up a lot of the performance calculations as they are so much faster things like the CPU and bus speeds are much more likely to be the storage limit than the disks themselves.
There is so much to ZFS that I don't understand, that it seems too dangerous to me. I also don't like how they don't focus one bit on performance. You can now expand RAIDz, but doing so is nothing like expanding traditional RAID. Performance drops with each drive added. It is not a complete resilvering. Dumb.
Great video! My few cents on this topic
1) Hardware RAID with writeback cache and single card is pure evil (TL;DR just don't do that) - whatever redundancy you get from RAID geometry is swept away by reliance on a single piece of hardware. When that controller fails, your data is in an unknown state, because some writes might be in the cache and thus lost. Also, when that battery fails, performance drops off the cliff because you can no longer use writeback cache (if you value your data), at which point if you plan for _not_ having the cache, then why rely on that piece of hardware at all? There are ways to do this properly with multiple cards that provide redundancy and synchronize their cache, but that's usually vendor specific and I haven't had much experience dealing with them, those make sense mostly if the setup is large, like multiple SAS expanders with tens or hundreds of drives (and usually hosting a huge Oracle instance or something like that).
2) MDADM is great for OS partitions because of its simplicity, but also for performance-sensitive workloads like databases and is extremely flexible nowadays when it gets to expansion or even changing between RAID levels (if you are ready for the performance penalty when it's rebuilding). But it lacks the features that make stuff like ZFS great, like snapshots (yes, I know you can use DM for those, but has anybody actually tried using that productively?)
3) ZFS is what you use if you actually value your data because of all the features, like streaming replication, snapshots, checksums... but you lose a lot of performance because of CopyOnWrite nature of the filesystem. This is what I use as long as I can live with that performance penalty. This should just be default for everyone :)
4) BTRFS - just don't (unless you are an adrenaline junkie). Every single BTRFS filesystem with more than light desktop use I ever encountered broke sooner or later. Very common scenario is colleagues claming BTRFS works just fine for them and me discovering I/O errors in kernel.log that were simply overlooked or ignored and resulting damaged data that were never checked.
5) there are many seldom-used but very powerful options for gaining performance, for example putting your ext4 filesystem journal on a faster SSD or adding a NVDIMM SLOG device to your ZFS pool, it might be interesting to see a video on those :)
Some more notes on your testing
1) FIO is complicated, IO engines are a mess. I spent good few months trying to get numbers that made sense and correlated to actual workload out of it. Sometimes you're not even testing the bottom line ("what if everything was a synchronous transaction") but just stress-testing weird fsync()+NCQ behaviour in your HBA's firmware. Not sure I have a solution for this, testing is hard, testing for "all workloads" is impossible and you can get very misleading numbers just because you used LSI HBA with unsupported drive, which breaks NCQ+FUA, which makes all testing worthless (ask how I know). Btw LSI HBA in IT mode is not as transparent as an ordinary SATA controller.
2) CPU usage ("load") might not seem that important when what you're storing are media files, but when you're running a database it translates into latency... except when it doesn't, which is what I suspect happened to you, because FIO wasn't waiting for writes to finish (so not blocked on I/O) which is what "load" actually measures. On the other hand, if you throw NVMe drives in there then that CPU overhead will start making a huge impact inside your database/application which can just throw away that thread immediately etc..
3) HW RAID controllers are capable of rebuilding just the actual data in an array, but that requires a supported hardware stack (drives and their FW revisions) and functional TRIM/discard in your OS.
4) ZFS performance testing is usually non-reproducible with faster drives or when measuring IOPS for a couple of reasons like fragmentation, allocating unused space vs. rewriting data, meta(data/slab) caching, recordsize or volblocksize vs. actual workload and weird double caching behaviour on Linux. Just try creating a ZVOL (let's say 2 times larger than your testing writeset) and then run the same benchmark multiple times on it, you'll start seeing different number on every run, trending downwards :)
If you need the reliability of drives connected to two raid controllers you need a real dual controller storage array
Thanks for the feedback.
I noticed the card in IT mode seems to be realize it cache to help single disks. This affected my testing and trying to get good results, but in the real world this should only help(unless there is a bug with the cache, but I guess this could affect any disk controller, the cache on the card is error corrected)
The more storage performance tests I do the mode complex it becomes and more variables I realize I have to control. I know these results are far from perfect and can't be extrapolate to every use case, but I hope these numbers are better than the often vague descriptions of performance I see online, like slow writes, or no practical CPU usage difference. I hope others can use these tests as a jumping off point to running more tests specific to their workloads.
Load averages likely weren't the best for seeing the CPU usage. Looking back running a CPU benchmark at the same time might have been a better metric. It was pretty reproducible though in my testing.
@@ElectronicsWizardry IT mode shouldn't use cache at all. In the past I used IR mode with RAID0 of single drives to get controller-based writeback cache (backed up by the battery), but in IT mode this is not possible.
Just beware that IT mode is not transparent at all. I am biased against LSI adapters based on my past experience, but they have sort of become the standard for home NAS/homelab setup. For small setups, I think it's better to get a cheap SATA controller. (And lots of AMD boards can switch U.2 ports to SATA mode drectly to the CPU which is awesome).
@@zviratko I might test it again, but it defiantly seems to used the RAM as a read cache on that card when set to IT mode. I should look into this more and might do some testing in the future
Do you have a model of sata controller you suggest?
BTRFS raid0 is awesome
this is nonsense the charts clearly show he is testing the controlller cash not the disk performance. thats why writes are so much better with battery
All the tests were done over a ~3m period, so the writes couldn't just be dumped to cache without testing the whole array performance. The cache helps with sustained write performance, so I'm far from jsut testing the cache..
ZFS also cares about your data by doing checksums. Hardware RAID cards lie a lot and will happily spew out any garbage they have written earlier.
Or just use SeaweedFS and stop waisting HDDs in RAIDs. Also where's the data for BTRFS RAIDs?
I skipped BTRFS as its RAID 5 and 6 solution isn't listed as fully stable yet. I like how BTRFS RAID is very flexible with adding and removing drives and mixed drive size configs.
So hw hba cards are only used nowadys in servers in something like a multi cluster jbod atorage set up where you might want redundant data paths... Reason md raid was slow in your dual socket is probably due to the fact its old. Everything uses md raid in the server world now.
Hardware raid is really firmware raid, its software “burned” onto a chip. Does that sound like it’s easy to update? It’s software and it will need to be updated. Remember when firmware was a joke? Its software written to the “eeprom” so between software and hardware is firmware. LOL.
Be careful with the results of this video. The system he's using has a very weak and underpowered processor in terms of single core IPC and without any additional information his performance problems are most likely localized to his system configuration. I've been using mdadm for years on a variety of different configurations and the only caveats were slightly lower. Random read speeds and in some cases higher disc latency than hardware RAID arrays but other than that, as long as you have a decent CPU (a 10-year-old optron is hardly a good test candidate) It's a viable solution for Budget conscious individuals. Fun! Fact: Intel maintains the entire software stack for mdadm and it's implemented as part of their Intel VROC solution on high-end server grade motherboards. One other caveat is nearly all modern hardware RAID solutions do not scale with high-end solid state storage devices and other nvme drives. Using these type of cards will bottleneck pretty quickly and experience diminishing returns. I've worked with distributed storage systems for years and stopped using hardware raid arrays almost a decade ago.
That opteron server might not have been the choice here as it was probably getting to be too slow for a realistic test bench. I was trying to stick with a slow CPU and slow HDDs as that seems more common in NAS units and home servers. SSD scale differently in performance but I stuck with HDDs for this test. Unfortunately no single platform can be used to extrapolate results for every use case, but I try to get the best results I can with the hardware I have.
Hopefully I can get something like a LSI 9560 and a few NVMe drives one day to see how a modern NVMe RAID card compares to software RAID on a high speed CPU.
During my testing I didn't notice the CPU threads being maxed out and the disk usage was being maxed out.
Err... Windows have no problem booting from a raid like volume, and have been able to do so since Windows XP days... So that's just plain misinformation you're spreading there. It's a bit fiddly to set up, but it's very much possible.
How can Windows be setup to boot from a software RAID volume? I've check a lot of sources over the years and yet to see a good way to do it. Storage spaces isn't bootabe to my knowledge(I think it was setup in a bootable matter on a surface system, but I don't think that counts).
@@ElectronicsWizardry You're right that storage spaces isn't usable for booting yet. Windows does however have the older raid like tech that was introduced in XP named Dynamic Disks, which can be. It supports simple, spanned, striped, mirror and raid5 volumes, and most importantly here, Booting from it is supported. But, for all usages except booting from a mirror set of dynamic disks, it's also so old that it's deprecated, but not yet removed either. It's even still there in the current previews of Win12, and probably isn't going to actually removed until Storage spaces drives actually are bootable.
As for how, just install on one drive. Once in windows, convert to dynamic, initialize another drive as dynamic (but no partitions), and extend the system drive over to that drive. It does however NOT work with dual booting, because it needs both the MBR and boot sector of the partitions to work. Normally you set grub to overwrite the MBR and chainload the bootsector version, but that does not work with dynamic disks.
Yea your right there are ways to do a mirrored boot, but compared to Linux the options are limited, its mirrored only, and uses a deprecated RAID method in Windows. I think its reasonble to say that a RAID card can a good option if you want to boot windows from a RAID array.
Thanks for the reply
@@ElectronicsWizardry No no. You misunderstand. Only mirrored boot is still what MS considers current with this method. You CAN boot any dynamic disk type, including raid5 style, and the method as a whole, is not deprecated. It's only for OTHER USES that it's deprecated. As you, I'd still recommend a raid card, that's not the point. But recommendations are different from possible. My point is simply that you shouldn't claim it can't do it when it's very much possible and has been for almost a quarter of a century.
@danieljonsson8095 let me give that a try then. I ran into issues tbh at last time I think I tried this method. Thanks for the correction.
lvl1techs made a great video on this why you should not use this, This is not a viable solution due to it not detecting write errors and bitrot. The current hardware raid cards are not the raid cards from the past th-cam.com/video/l55GfAwa8RI/w-d-xo.html
I watched that video when it came out, and probably should have talked about this issue more in the video. While this is a potential issue, It generally seems to be pretty rare in practice due to the error correction on HDDs/SSDs. I have used many hardware raid cards with data checksumming in software and almost never get data corruption/bitrot. Zfs and other checksummed filesystems are nice and help to keep data from changing, and notifying the user upon issue. My general experience is checksum errors on HDDs is extremely rare on a drive that doesn't have other issues. Keeping bitrot away is one reason why I generally stick with ZFS as my default storage solution, and try to use ECC if possible.
@@ElectronicsWizardry ECC will not do anything against Bitrot, Bitrot refers to the gradual corruption of data stored on disks over time, often due to magnetic or physical degradation. ECC memory primarily protects against bit flips in system memory, not storage, so it does not prevent bitrot.
ECC memory for ZFS is recommended for not writing any bit flips to disk, this is even a more extreme measure since bit flips in memory occur even less then writing errors so to on 1 hand being very lax about data integrity using hardware raid cards but then suggesting ECC memory for ZFS on an even less likely issue is very strange.
HDD's dont have any error correction, they just write the data it gets, it does not have any way to even know what data would be the correct data so how can you even start to do error correction on it, the filesystem is responsible for that by doing checksums. (yes a HDD has ECC but this is for reading data from the instability of reading such tiny magnetic fluxes)
You should rewatch the video, specially at 6min timestamp, there are no cards that do any checksums anymore
" I have used many hardware raid cards with data checksumming in software and almost never get data corruption/bitrot" this statement is 1 to 1 equivalent to your grandma saying "I got the age of 90 with smoking a pack of cigrates a day" Good on you, but those empirical statements are kinda useless and specially as "informing" people it only hurts, since i bet you, atleast one person is going to buy a raid card in 2025 because of this video
@@ElectronicsWizardry After Advanced Format drives (512e included) came out, all the different DIF formatted drives became much less attractive because the how powerful the new ECC algorithm AF drives uses (LDPC on 100 bytes per 4k physical sector). This is why most modern hardware raid controllers don't use DIF formats; because it's been made redundant by the checksumming that is running on the drives themselves.
Yea ecc doesn’t help with bitrot but you want all memory and interfaces in the data storage system to be error correcting to prevent corruption. Ecc and disk checksuming help with different parts of the data storage pipeline.
Hdds do have error correcting that is hidden from the user available space. That’s how drives know if there reading the data correctly or not. Also interfaces like sata and sas have error checking as well. The assumption raid cards make is the interfaces used by drives and the data on the disks is checksummed so the drives only return correct data or none at all. There are edge cases where this can occur but it’s rare in my experience and I’ve used a lot of drive so if this was common I’d guess I would have seen it.
Also a huge amount of business servers are running of these raid cards. If there was a significant data bitrot risk I’d guess this would have been changed a while ago.
@@ElectronicsWizardry I definitely agree that it would be ideal to have ECC throughout the entire pipeline of subsystems (all the SerDes structures and all memory levels) the data goes through, but adding complexity to a design can also introduce new sources of errors; there is probably some ideal balance between simplicity and error checking, and I'd be willing to bet we're pretty close to it in most systems.
I suppose bitrot is kind of an all-encompassing term, but the ECC that runs on hdds is being used to checksum every sector that is read, so it would protect against some kind of surface defect or magnetic bit problem below LDPC threshold on the hdd causing bitrot, but it'd be narrowed to correcting only those sources of bitrot and not some kind of wider memory or cabling issue. This is all mostly transparent to the user like you said, you'd have to do into SMART to look for reallocated sectors to discover this was happening.
2 minutes ago 😅
hardware raid is dead! ceph/zfs leads everything. i went away from both software and or hardware raid now for about 4 years. im happy with my zfs in my homelab!
Yea I'm a big fan of ZFS/Ceph for much of my use I go with ZFS. For many of my uses squeezing a bit more performance out of a array is less important than features like ZFS send/receive and I am very used to how to manage ZFS.
Also next week's video is gonna be using ceph in a small cluster so stay tuned.
@ElectronicsWizardry The redundancy and ease of management that ZFS offers are truly unmatched compared to traditional RAID. If you're replicating something like RAID0 or RAID5 and want to avoid sacrificing performance by skipping a ZRAID setup, there’s absolutely no reason to stick with RAID. With ZFS, you get almost the same speed but with far better security!
For me, proper error correction is non-negotiable! I refuse to rely on systems that don’t (proper) offer it. While I do like BTRFS, it’s still a no-go because I prioritize both security and performance. Unfortunately, BTRFS RAID setups, as I would need them, are still too unstable. Honestly, there’s anyway no reason to move away from ZFS, even if BTRFS eventually becomes stable for certain RAID configurations.
Btw, i like your recent style change with this fancy beard! :-D
MinIO is also awesome. I have all my media stored on a MinIO cluster
Only doom beats ZFS.
Nothing is as reliable and versatile.
Else...
I've exported imported zfs pool between several oses, HBAs, mainboards over the year even one on an rpi4 and even on Windows. Nothing beats the flexibility of ZFS.
By the way I just don't get it why people are stuck using raid levels. Raid 5, 6, etc, adrenaline junk much?
Especially with the ability to enlarge pools both vertically and horizontally, ZFS mirror stripes (kinda like 1+0 but exaggerated) are both flexible and performance wise preferable.
Especially when it comes to resilvering after disk changes, mirror vdevs do not put stress on the cpu and the rest of the drives on other vdevs. They do not require massive amounts of hash calculations like raidz types do (as well as raid 5, 6, 166 etc )...
3-way mirrors combined into a 9 disk pool for example is a beast.
Or you can be less greedy and prefer 3 2-way mirrors striped together.
Additionally zfs supports several kinds of cache additions to a pool.
Using an nvme for read cache, another for writes, another for metadata and more can be done in order to drastically enhance performance.
One more thing, mirrors do not hit cpu as raidz modes do. Even mirrors stripes are this way..
Still, since every data block in zfs, has their checksums calculated and stored, there's still more cpu overhead with respect to others.
@@mimimmimmimimas for why people stuck with old raid levels - because the industry stuck with them as well - and as they just define basic priciples they're still valid today:
raid0 - striping
raid1 - mirroring
raid5 - single-parity
raid6 - dual-parity
also: for zfs one has to know that a pool of 2 or more vdevs is an implicit stripe - and hence a pool of N mirrors still is the same as raid1+0
in fact every zpool is a raidN+0 - so if one vdev fails the entire pool fails
that said any raid1+0 CAN take up to half of its drives fail as long as only one drive per mirror fails
but a raid1+0 will also fail from one double failure when both drives of the same mirror fail
so depending on your usecase a raid6 might be the better option than 1+0 - because in a 4-drive array it doesn't matter which two drives fail
I am glad that the YT algorithm recommended my this channel. Nevertheless you should hit a gym or at least do 20 pushups a day.
I would like to see these numbers re-run on a significantly newer more powerful machine. I'd also like to see ZFS utilize QAT and other hardware accelerators, GPU compute, etc. because of all the data integrity guarantees it provides, it's a little painful to see it come in last place for performance in these tests.
One of the biggest problems with hardware RAID is, that more sophisticated RAID modes are locked behind an extra licence purchase.
Yea that can be annoying on a lot of cards. Dealing with licensing is a pain in IT. I often try to stick with free open source products if I can even if I have the money available just to save me the hassle of licensing.