Chris's expertise is astounding, we're glad to have him in the iX Family! This video is a must watch for anyone looking to expand their TrueNAS Knowledge. As always, fantastic job on the video 2GT Team!
Bro.... this cleared up a TON of the same questions I also had.... one big thing that i learned was that write cache SLOG is not used in SMB Shares, where is how I use my NAS at home. But if my VMs were writing to the same pool, then maybe it would benefit from that. The other thing I learned was that the "write every 5 seconds" is really "up to 5 seconds unless something tells it to write" i found this super helpful. I'm glad you asked all the same questions.
This is fantastic stuff! I have noticed the performance improvements by throwing more RAM in a box. My friends complain that ZFS is a RAM hog. I just tell them that if you spent money on memory, wouldn't you want the server to use it all? By watching ZFS use up all the memory, I know that it is respecting my hard-earned dollars and putting that memory to work. I never want my resources sitting idle while performance suffers. Thanks for this video!
So I’m currently running a 13500 with 64Gb of DDR5… I have a 100TB ZFS pool for my Plex media and my video and photo editing footage, which I want to edit directly off the NAS… would I benefit from doubling up to 128GB?
run arc_summary | more from the CLI and look out for 'Cache hit ratio' and check your percentage. If it's approx 95% or higher then probably not worth the extra RAM. Much lower then yeah. The hit ratio shows you as a percentage what amount of your reads are coming from RAM. This parameter resets at reboot so will work best on a system that has been running in a typical fashion for a while.@@inderveerjohal7218
Concise and straightforward explanations. My TrueNAS server has 24GB of RAM, and I am using it primarily for backups of my home lab VMs and a network share for the occasions I decide to stash data away. Now I know that I could probably pull that SSD and use it for something else. Unfortunately, you have now planted the seed that I might want to reconfigure my VDEVs into 2 mirrors, versus a Z1. May have to dedicate an afternoon to a new project!
Dropped in to add this comment, really like the style of this !! Don't get the expected results and get a real expert to explain in plain English giving real world examples. Very useful and informative.
So one question I would have (just starting my home NAS journey) is how does the SSD affect some of the recommendations starting at 11:38? In particular the starting machine I have is a 8 available 2.5 bays and going to start with 2 vdevs in RaidZ2 with 2 TB drives. But thought about using SSDs instead.
FYI: Optane nvme (U.2) is basically ideal for slog. I use two of the cheap 16GB NVME versions in a stripe to decent effect (boosted ~30MB/s sync writes to 250MB/s sync writes (till they fill.)) I am more concerned about unexpected power loss than those drives failing. Obviously a mirror would be wiser.
Did those tests a few years ago with NMVe and got to the same conclusion and explanation. For the entreprise with big data access, you will need those cache, but for labs or small entreprises, the best optimization to get is ADD MORE RAM. That's it! Once you understand this, everything is quite simple.
Coming from a classic RAID background in IT, I support the use of RAID10 (stripe of mirrors) or equivalent. Sure, it's 50% usable capacity, but i/o is fast and resilient, plus a drive replacement takes as long as a single full disk read and write (since that is all it needs to do), meaning you get back to nominal quicker than other raid configs. Just remember that RAID (including raidz) is not a replacement for backups. Tip: Backups are a complete waste of money, until you need them, then they are worth every single cent ten times over. Tip2: if you want to use truenas you will want to start with at least four drives, two cheap and small ones for the truenas OS (mirrored), and two (or more) large ones for data storage. You cannot use the OS drives for data storage (this is not a bug, it is by design). You can skirt this a bit by using virtualization for the truenas OS (installed to a virtual disk) and passing the physical data drives to the truenas vm to use directly (just don't use the vm capabilities of truenas scale as nested vm's is a whole new level of WTF). Took me a long time to fully understand this, hope it saves someone on the internet some time.
Note that, like most of these videos, the zil/log is being confused with a write cache. It's frustrating to see ix reps reinforce the misunderstanding. It is not a write cache, it's a write backup. As long as the write works normally, the zil will never be read at all. Zfs write cache is in RAM (transactions groups), the zil or slog backs that up for sync write in case the ram copy is lost. Poorly designed slog will slow your pool down, since it's NOT a write cache, you are adding another operation.
Exactly, the benefit of the SLOG is the system to reply "sync write done" as soon as this backup is written, while it will not send it as long as the data is only in RAM. So the physical write process is not accelerated, but the overhead implemented by sync writes and the required replies is moderated by a very fast SLOG device that will allow much quicker replies to be sent.
@@Burnman83 except it is accelerated because it's been written to your fast slog device, which is just as good as writing it to the spinners. it IS effectively working as a cache also.
@@charliebrown1947 Yes it accelerates the capability to reply positive to a sync write, because writing to slog will already trigger the "ack", but it is NOT acting as a cache, because in that case it would delete the written data from RAM and then transfer from SLOG to disk. Afaik this is not what is happening. Data is kept in RAM and written from RAM to SLOG and disk in parallel. The only advantage this is the Delta between the SLOG SSDs acknowledging the write vs the disks, which means the smaller the files the more acceleration and vice versa, but no, it is still not a cache ;)
@Burnman83 once the data is written to the slog it is as good as written to the pool. It can be purged from arc and it is literally behaving as a cache. I don't know why you're trying to say it isn't. The data is cached in the slog and written to pool later.
@@charliebrown1947 ARC is RAM-based read cache, but I know what you wanted to say. From all I know this is not the case, SLOG is never read, only written and in case of a disaster, power outage or whatever read later during the remediation. Otherwise data remains in RAM, effectively blocking it and slowing down the transfer when you are running out. That means, big SLOG is not worth it, as it will only help for bursts anyway. If you had a piece of vendor documentation for me telling otherwise, please go ahead and prove me wrong, but until then I stick with what I learned, and not with what I think would make sense. This is by no means meant as an offense, I'd love if you were right, would give me all the tools I need to build lightning fast SLOG and force sync on all writes and like that have permanent insane write performance, but I'm afraid this is indeed not how SLOG works.
This was extremely helpful. I'm in the process of doing my homework to migrate my home server to TrueNAS and was planning on spending on SSD's for caching. Now I know to get more memory instead.
I've got 8x 4TB drives in RAIDZ2 and 2x 500GB SSD. I need NAS mostly for Plex (movies and music) and Nextcloud/Piwigo (photo sync with iPhone). Should I use 2xSSD as ZFS L2ARC read-cache or maybe make a mirror with these SSDs and use as a pool for plugins?
I had the exact same question as you. My conclusion is that I'm almost certain you don't want to use it as L2ARC. My reasoning: 1) Are you sure you would benefit from an L2 read cache anyway? Is your typical working dataset that is accessed repeatedly larger than what can fit in memory? If so, can you increase the RAM? It will be much faster, and the cache algorithm for data in RAM is more sophisticated than data in L2ARC. 2) you will probably kill your SSDs quickly, unless they are high endurance (enterprise grade SSD or Optane)
you don't need to mirror the l2arc. Its data is already present in the pool so if it fails you don't face any data loss risks. I personally don't think adding l2arc will improve your performance significantly to notice, but I'm not a zfs guru. This is generally useful when you're caching HDDs with fast NVMe SSDs, for heavy-IOPS stuff like video editing.
That's not a size limit by design, but a practical limitation of real world use case, the amazing Lawrence Systems goes into mathematical details about how those numbers came about in this video: th-cam.com/video/M4DLChRXJog/w-d-xo.html
This was a interesting discussion. I am interested in the 20-100 tb realm of storage and RAM. Also this was focused more on the scale side of discussion and not so much on the NAS side. Helpful as starting point of discussion but there are more questions to discuss. Also I would like to see the tb transfers rather than the mb and gb. Former DAS user wandering in the darkness of NAS.
5:03 : That's because a SLOG is not exactly a write cache. This is only going to speed up synchronous workloads, and mostly will benefit random writes, not sequential.
Back in my Oracle DBA + Unix sysadmin days, we had an acronym: SAME (stripe and mirror everything). It kinda still applies, especially with spinning disks. Disk is cheap, mirror everything if you're concerned about write performance and fault tolerance. Of course if your data sets can fit on SSD or NVME then do that and get on with your life (I would still mirror it though).
Hm, watching this video there are actually more questions afterwards than before =) 1. When your initial tests already almost capped out on the full 10g speed, how did the host expect the speeds to improve through caching? =) 2. Why does the iX systems sales engineer explain SLOG wrong and enforce the common misunderstanding that SLOG will act as a write cache for sync writes?! 3. The system I am running Scale on is an old Dell R730 with 512GB of RAM, lots of different variations of pools of HDDs and SSDs/NVME. The network speed is 25g. How come that through Samba I barely ever get close to the numbers 2GuysTek get in their tests here over SMB even if I tests against a pool of very potent NVME disks only and as said, 8 times as much RAM (let alone much more CPU power). Have these tests been conducted with actual real life testing, or just with some synthetic test tools that don't tell anything about the real-world performance anyway? It'd be great to have another video where the impact of a metadata pool would be tested and also some SMB tuning that enables you to actually use these kinds of speeds in any real-world setups, rather than capping out at around 500-700Mb/s all the time due to the flaws of SMB. All the best! =)
@@charliebrown1947 That is funny to read, considering there are literally flow diagrams in the official documentation proofing me right and as said, tested it in lab and surprise, official documentation is correct. Explain to me one thing: Who is the one of us that is acting like he knows everything? The guy that actually tested all this after reading the official documentation, explaining test serious that were tried and proceed his point that you can easily replicate, or the guy that managed to write 6 comments or so without any info exceeding "you better educate yourself".
It would be good to do an updated video with the special vdev on flash. It speeds up things considerably. Write cache and read cache are nice but in my experience make less of a difference compared with having the metadata offloaded onto ssd. Still good job gathering so much data.
So, one photographer, working on small projects and regular file transfers it really doesn’t matter which raid I choose as long as I have as much as I can fit in there to speed up the transfers? Makes sense if that is accurate.
Ok so i am setting now my new truenas server just for 4k movies backup and streaming,., since i am having 2x 12 TB HDDs with 4k remuxes... Should i mirror then or raidz2?? Also what should be my ram?32gb? Thanks!! Should i mirror
Yes to a mirror - that’s the only way you can survive a single disk failure with two drives. If you had 3 I’d recommend a RAIDZ1. In terms of RAM, 32GB is probably the minimum, if you can swing more, do it. You’ll have better overall performance.
Nice video, and hearing from ixSystems directly helps re-enforce things for me. Mirrorred vDEVs for storage using high capacity HDD is fine as long as you have one or more high end, DLP enabled SSDs in front of them as Log vDEVs. I don’t have a L2ARC because I have 128GB RAM and ARC is only consuming 62GB. I run iSCSI for vSphere on 10GB networking and never had a problem with disk performance.
If I'm not mistaken, your ARC is 62 GB more or less by definition - by default, the ARC will occupy (up to) half your memory (this is something that can be tuned).
Question, I have 2 x 2gb nvme sticks that I am attaching via carrier boards to slimsas ports (motherboard only has 1 M.2 slot, using for OS drive, and I don't have anymore PCIe slots left) can I mirror them and then partition them, so I can use a small parts of them for discrete caches (read & write) and then the rest for SLOG? The primary use of this NAS is going to be VM hosting with a smigg of fileshare. Also, what is the suggested block size for VM hosting scenario vs a fileshare scenario?
You cannot carve up parts of disks for caches in ZFS unfortunately, only whole disks. If you're going to be running VMs, get as much RAM in your host as you can, and build out a SLOG for sync writes out of the two NVMe SSDs you've added. I think a read cache is less valuable in that scenario.
@@2GuysTek Will do thanks! Was hoping to be able to put small files reads on NVME for a specific reason but oh well. The storage disks are all SSDs anyway. I can put 64gb of RAM right now. Will see if I can boost that later. Thanks again...
my debian desktop has a raidZ1 I use to mount my home directory... Definitely going to reconfigure it with mirrored pairs and throw in a log ssd for it the next time I change it up. Is it finally time to nuke my windows 11 install for good? :O
I threw 2 old NVMe SSDs into my server as read cache but performance didn't really go up in a noticeable way. Now I know why, wrong workloads (lots of small reads but very few ones bigger than my RAM).
is it possible to do one on encryption? vdev level encryption vs dataset encryption vs 2 layers of encryption. I feel like it will be relevant to many people
The other factor I’d like to hear more about with vdev design decisions is future expansion potential. Home users may start small (2-4 disks) then add more over time. Are mirrors really the only way to go to expand an existing pool? Aside from potentially creating an entirely new pool, which has other trade offs.
In 2021 there was talk about OpenZFS adding single-disk add functionality to an existing RAIDZ VDEVs, however from everything I've read it sounded kinda kludgy and not very easy. I'm not sure where they're at with it today, I'll dig into that. That being said, you can expand a pool with RAIDZ(1,2,3) VDEVs by adding another VDEV to the pool. This will expand your pool size and give you better performance as well. It's still not an individual disk add, you'd need to have at least 3 disks (or more depending on RAIDZ type) to build another RAIDZ VDEV. So at least you're not locked into a Mirror VDEV only for expansion which is better on parity-cost.
To add more to this. I just heard back that the single-disk add feature is still not available, so your only way to expand an existing pool is to add additional VDEVs to it.
I just built my Trunas with 4x HDD and 2x SSD and I went with the pool of 2 mirrors (HDD) and just mirror (SSD) as I read somewhere about this that mirrors are the best. So to speed up day to day tasks I use fast SSD pool and for backup and media consumption the slow HDD pool. 64GB of RAM seems to be more than ok to support 28TB of my total storage
10:44 "mostly it's coming down to the performance you're after for your workload." Ttttthhhhhhhhaaaaaannnnnnkkkkkkk you! I have been shouting my damn lungs out for nearly a decade now. Cache in memory is meant to act as a way to not have to reach down into the disk. This was really irksome when people were like, "You need 1GB of memory per 1TB of total storage on your ZFS pool!" No. You don't. That's dumb. Closer to the truth would be you need 1GB of memory per 1TB of DATA in your pool. Because it takes exactly 0MB of memory to track empty storage. A better metric would be, you need more memory if your ZFS ARC hit ratio drops below about 70% regularly, and the performance hit is starting to irritate you. One thing I don't like is how people, even people in the know, talk about the SLOG. It's not a write cache, it's a secondary ZFS intent log. The ZFS intent log (ZIL) exists in every pool, typically on each data disk. It's essentially the journal in every other journaling file system. Before you write data to the disk, you write that you're starting an operation, then you write the data, then you tell the journal that you've written the data. ZFS does the same thing, though it actually writes the data to the ZIL as well. When people talk about caching writes, they're usually thinking about something like battery backed storage on RAID controllers. ZFS will never do this exact thing. When you add a SLOG to a pool, you're ZIL is basically moving over there. This takes IO pressure off of your spinning rust data drives, and puts it on another drive. To that end, if you have a pool of spinning rust drives, and you add another spinning rust drive as the SLOG, you will see write performance increase, just not the massive increases that you might expect from a typical write cache. Your spinning rust data drives will absolutely thank you in the long run.
ARC, Sync TGXs, SLOG, and L2; the rabbit hole is very deep. I've spent a lot of time understanding how all this works. I wonder how many hours I've sat watching `zpool iostat -qly 10` trying to actually understand my workload.
I built out my truenas system with my old pc lol I have an amd 5950x, 128GB of ecc ddr4, and a pair of lsi 9305-16i cards hosting 28x h550 20TB drives. I have a 990pro 1tb for boot. And a pair of 118gb optane in mirror for slog. It's tied to the network with dual 10gbps links i use it for VM's but mostly for plex
I have a 8 disk pool with 2 4-disk Z1 Vdevs. Good balance of performance and ability to lose a disk. Offsite backup to make up for the risk of not running just a Z2 vdev. And because raid isn’t a backup.
Great video, thanks. But one correction. You keep referring to a “write cache” when talking about the SLOG. But it’s really a ZIL (ZFS Intent Log) and doesn’t work the way you might think that a “write cache” might. Chris glossed over it. Going into that in more detail would be useful because this makes a big difference when sizing for the ZIL, which is mostly based on your network throughput.
It seems to me that if you're trying to compare purely the relative raw performance of the different layouts, that you would want to run the benchmark program LOCALLY on the NAS. Something like 'iozone' would be one such choice. Measuring the over-the-network performance would be a separate analysis.
I found that mirrors are way slower than dRAID with multiple vdevs. I have a 40 mirror SSD pool that runs at less than half the speed of my 60 HDD 4x dRAID pool in another enclosure with 3 SAS expanders going into one SAS controller maxing out at 4.8GB/s after overhead. One helper for my mirrors was going from 128K to 1M recordsize. But mirrors were still slower! I also disabled ZFS cache when running tests. dRAID is incredible with as little as 2 redundancy groups.
I guess caring about performance is important for work flow, as a home user I have been running a 2 HDD mirror and no cache for a few years now and I don't it to be a big deal, but yet again my largest file is 15MB
My poor man's usage of OpenZFS 2.1.5 runs on a minimal install of Ubuntu 22.04 LTS on a Ryzen 3 2200G; 16GB; 512GB nvme; 128GB SSD; 2TB HDD. All my application run in say 6 VMs of which Xubuntu with the communication stuff is loaded always (Email; Whatsie etc). I have 3 datapools: - one with the 11 most used VMs on the nvme-SSD (3400/2300MB/s), running with primarycache=metadata. Boot times of e.g Xubuntu are with caching ~6,5 seconds, while without caching it takes ~8 seconds. I wait ~1.5 seconds more, if I can save say 3GB of memory. - one with 60 more VMs on the first faster partition of the HDD. Here I have 2 levels of caching
The Log VDEV doesn't exactly function as you describe here. Chris touches on this a bit, but there's an extra little caveat to how the Log VDEV works: Async writes are cached in RAM and not written directly to disk by default. But sync must be committed to disk before being considered complete. A power failure or crash will result in the RAM write cache being lost. That's generally fine for async writes, but for sync writes (like Chris says, something like a database or virtual machine, etc.), lost writes could really screw things up and corrupt your data. The Log VDEV fills this gap by providing non volatile storage of the write intent log. Without it, sync writes have to wait on the spinning disks
Fair point. It would also be fair to say that if you're using ZFS, it would be smart to have a battery backup to protect against that situation regardless though. It's my understanding that even with a Log device, ZFS is going to use the RAM _first_ which exposes you to the same issue. Would you agree?
@@2GuysTek It helps to understand the exact steps the OS takes for sync writes: 1) An application of some kind requests to write data synchronously to ZFS 2) The data sent by the application is stored in RAM to be written to disk in a transaction 3) The data sent by the application is written to the *intent log* which exists on non-volatile storage (regular disks, not RAM) 4) The application is informed that the data it has requested to be written has been successfully saved 5) The transaction of writes in RAM is successfully written to the storage pool (this is the thing that "happens every 5 seconds" but not really that Chris talks about) 6) Now that the data is safely in the pool, the copy of the data we made in step 3 is deleted from the intent log So yes, RAM first. The only time the intent log is actually read from is if a power loss or crash happens sometime after step 4 and before step 5 is finished. It's important to remember, the intent log ALWAYS exists. When you set up a Log VDEV you're just telling ZFS specifically where to put that data. Without a Log VDEV, it just lives on your regular storage VDEVs. A few things to keep in mind when choosing disks for your Log VDEV: 1) You don't need a ton of storage. You're committing your writes to your disk every few seconds. So your log really only needs enough storage to hold a few seconds worth of data. 2) You really don't need a ton of storage. Even ignoring #1, remember we cache in RAM first. You can't cache more than that. A Log VDEV that is larger than your RAM is wasted space. 2) Most SSDs cache their writes in their own RAM inside of the SSD itself. It is possible for a power failure at the exact moment where ZFS thinks the data is safe but the data is only in the RAM cache of the SSD. Always use enterprise SSDs that have *power loss protection* for your Log VDEV. It's too bad Intel killed off Optane, because it is the ideal log drive. They're incredibly low latency and a lot of them write directly to the flash cells (no SSD RAM). In fact, most of it is super marked down right now if you want to pick some up for later use.
@@biohazrd Good points, but I would like to further clarify about the size of Log VDEV in case of flash based media. Although it seems to be a waste using large SSDs for Log VDEV, that is not necessarily true in all cases. In a heavy write environments, it's wise to use larger SSDs or drives from a different series/manufacturers. Flash wear out (TBW) can kill smaller drives from the same series both fast and theoretically near simultaneously. So don't save on Log drive capacity.
@@miroslavstevic2036 Yeah, good point. I didn't think about over provisioning but you can really stretch your endurance with a larger SSD in that scenario.
@@biohazrd Also, smaller drives these days are not very good in terms of gb/$. It's like, do I pay $40 for 512GB or $50 for 1TB? I might as well just get double the TBW for $10 more.
Chris did an excellent job of explaining how ZFS works. Rich, great job getting him on a call for all of us to hear a proper and thorough explanation from an expert! excellent video. At 10 mins and 20 seconds in, masterful answer! TrueNAS is a fantastic storage product.
There's also draid now which is an abstraction of raidz with hot spares. I use it for my VMS and lxcs since resilvering time is much much faster and the performance is adequate for my uses. Hot spare capacity is distributed throughout the vdev so on a resilver data is read from all drives and written to all drives simultaneously instead of all drives mobbing the hot spare being resilvered.
My understanding is that dRAID isn't available until the future version of SCALE comes out. We'll certainly be evaluating the new RAID type when it comes to production!
Perfect demonstration how little a random TH-camr understand about ZFS and TrueNAS. You just fail at the basic ZFS understanding exam. Next time read the doc is clear enough. 🤷🏿♀️🤷🏿♀️🤷🏿♀️ 🤦🏿♀️🤦🏿♀️🤦🏿♀️
Testing this over SMB ruined any real performance testing. It has huge unpredictable overhead. You should have used something that will not warp the test results like NFS.
I disagree, however I also understand where you're coming from in terms of ARC. I think it's completely fair to test a fresh pool's performance because that's a normal state of a NAS in it's functional life. Saying the only way to test performance is on existing, cached data, isn't correct either. Maybe a compromise is to run perf tests on fresh and warm data instead.
News Flash, you did the tests WRONG as well...... Sequential is NOT how you figure out IOPS. You need to test with random reads/writes. SEQUENTIAL would be good for backups/restores. Try running some VM's and really get ideas about how caching works. Also if you are using spinning rust, I would ALWAYS recommend READ and WRITE caches if you can afford it.
I really wished youtubers who show their face and background all dark-friendly wouldn't jump to fullscreen glaring-white in a split second. Let's be realistic about when and how us degenerates actually watch stuff.
Sure you can! Try this: cd; mkdir zfstest; truncate -s 100M disk{1..5}; zpool create mixedpool raidz /root/zfstest/disk{1..3} mirror /root/zfstest/disk{4,5} It works just fine.
Shame he doesn't understand how a SLOG works (not thats thats unusual). A SLOG is NOT a cache. Its always written to and never read from (in a steady state)
Tom from Lawrence systems did a video on it, so just you didn't know the best performance. It's ok, you've only been using truNAS for....errrr.... Years 😢
that dude from ix is wrong... it does not matter what speed you have on the transfer drive.. the only speed that matters is the drive you are writing to.. if the drive you are writing to is 5400 for speed and the drive you are writing from is 7200 it will slow down..just as the tests just showed... the tests prove the ix guy is lying... cache drives do nothing to speed the write process because the drive you are writing to is slower... all transfers slow down when you are writing to a slower drive.. ix needs to remove the cache drive options as it just wastes time and money.... using the ram as a pretend cache drive just burns out your ram faster...
So....the TL;DR is if you do ALL of that (you have a server that does everything), then you're effectively "screwed" in the sense that there is NO optimal configuration for you because ANY configuration that you will deploy will be the "less-than-optimal" configuration for the different types of things that you are doing with the system that does everything.
No. In my opinion the TL;DR is, generally speaking, to add as much RAM as you can to your host. Adding more RAM will give you the most noticeable improvement in performance over any VDEV layout in a 'does everything' use case. I think that knowing the 'easy button' is to add more RAM and go with a RAIDZ2 for a general purpose NAS takes a lot of the confusion over what VDEV config you should use.
@@2GuysTek Two things: 1) Adding more RAM isn't a vdev configuration (per the title of your video). 2) re: using a raidz2 layout -- and whilst that might be the overall "average" layout that you can deploy for a "does everything" case, but as your own data shows (and also based on your discussion with Chris from iXSystems), different use cases have different recommend vdev layouts. But if you use one layout for a "does everything" system, then it is NOT the optimal vdev layout for the different use cases that a "does everything" will need to serve up. i.e. optimal is "here", and the actual deployed vdev performance is "here" -- at a less-than-optimal level of performance, for that use case. In other words, no single workload "wins", and they ALL lose some (level of performance), and that is the optimised solution where there are no clear winner, and everybody loses some (performance). (i.e. there isn't a vdev layout that's a clear winner for the different use cases, in a "does everything" system.)
The optimal thing to do would be to have multiple pools/VDEV for your workloads. Whether someone who is using a ZFS 'do it all' server has the resources and wants or actually needs to do that is a totally different matter but the implication here seems to be that you are limited to one VDEV type on a server which isn't the case.
@@nadtz That will depend on the capacity requirements, how much capacity you are willing and/or are able to sacrifice for redundancy/fault protection, etc. as a function of cost. There is ALWAYS the perfect "scientist" solution where if money wasn't an object, you can deploy the theorectically perfect solution. But if you had a budget of only $1000, but there aren't any changes to the statment of requirements, what you're going to end up deploying, based on that fixed, finite budget amount, will be very different than your theorectical ideal solution.
@@ewenchan1239 Obviously. The point is you said "...you're effectively "screwed" in the sense that there is NO optimal configuration..." and this is not true, 'scientist' or not. I clearly stated that having the resources, wanting or needing to deploy the optimal configuration is different from the fact that it is possible so you are just reiterating what I already said.
Chris's expertise is astounding, we're glad to have him in the iX Family!
This video is a must watch for anyone looking to expand their TrueNAS Knowledge. As always, fantastic job on the video 2GT Team!
Send me a mini please
Shoutout my presales engineer Tyrel cant wait for my M60 to show up in sunny FL.
@@RobSnyder Tyrel is the best! We cannot wait for it to land in your hands!
May I ask for a detail ?
Why with 12 drives in RaidZ2 there are 2 R/W queue ? (Minute 14:07)
@@MattiaMigliorati Chris says 2 x 6 disk Z2 so 2 vdevs = 2 concurrent read/write requests. The more vdevs the faster you go.
This is probably one of the best videos on truenas on yt. I've learned alot thanks
Second that
Bro.... this cleared up a TON of the same questions I also had.... one big thing that i learned was that write cache SLOG is not used in SMB Shares, where is how I use my NAS at home. But if my VMs were writing to the same pool, then maybe it would benefit from that. The other thing I learned was that the "write every 5 seconds" is really "up to 5 seconds unless something tells it to write" i found this super helpful. I'm glad you asked all the same questions.
This is fantastic stuff!
I have noticed the performance improvements by throwing more RAM in a box. My friends complain that ZFS is a RAM hog. I just tell them that if you spent money on memory, wouldn't you want the server to use it all? By watching ZFS use up all the memory, I know that it is respecting my hard-earned dollars and putting that memory to work. I never want my resources sitting idle while performance suffers.
Thanks for this video!
zfs will only use the amount of memory you specify. by default, this is 50%
@@charliebrown1947 that only applies to ZFS on Linux
So I’m currently running a 13500 with 64Gb of DDR5… I have a 100TB ZFS pool for my Plex media and my video and photo editing footage, which I want to edit directly off the NAS… would I benefit from doubling up to 128GB?
run arc_summary | more from the CLI and look out for 'Cache hit ratio' and check your percentage. If it's approx 95% or higher then probably not worth the extra RAM. Much lower then yeah. The hit ratio shows you as a percentage what amount of your reads are coming from RAM. This parameter resets at reboot so will work best on a system that has been running in a typical fashion for a while.@@inderveerjohal7218
without a doubt. @@inderveerjohal7218
Huge thanks to you and Chris for explaining all this for me, exactly what I needed as I am setting up my first TrueNAS.
Great Video and so thankful for the internview with Chris. Amazing.
Chris is awesome! All of the people at iX are fantastic! We really appreciated their insight on this video!
Concise and straightforward explanations. My TrueNAS server has 24GB of RAM, and I am using it primarily for backups of my home lab VMs and a network share for the occasions I decide to stash data away. Now I know that I could probably pull that SSD and use it for something else. Unfortunately, you have now planted the seed that I might want to reconfigure my VDEVs into 2 mirrors, versus a Z1. May have to dedicate an afternoon to a new project!
I know I made changes to my pools after going through this! Thanks for the comment!
Dropped in to add this comment, really like the style of this !! Don't get the expected results and get a real expert to explain in plain English giving real world examples. Very useful and informative.
as someone in process of upgrading my TrueNAS Scale hardware, i found this very helpful. Thank you!
So one question I would have (just starting my home NAS journey) is how does the SSD affect some of the recommendations starting at 11:38? In particular the starting machine I have is a 8 available 2.5 bays and going to start with 2 vdevs in RaidZ2 with 2 TB drives. But thought about using SSDs instead.
FYI: Optane nvme (U.2) is basically ideal for slog. I use two of the cheap 16GB NVME versions in a stripe to decent effect (boosted ~30MB/s sync writes to 250MB/s sync writes (till they fill.)) I am more concerned about unexpected power loss than those drives failing. Obviously a mirror would be wiser.
That's the biggest risk you have is power loss and the uncommitted SLOG data. Get yourself a UPS just to give yourself some breathing room!
@@2GuysTek One should get SSD disks (like NVMe) with PLP (Power Loss Protection).
Did those tests a few years ago with NMVe and got to the same conclusion and explanation. For the entreprise with big data access, you will need those cache, but for labs or small entreprises, the best optimization to get is ADD MORE RAM. That's it! Once you understand this, everything is quite simple.
Fantastic info. I too was wondering why my ssd vdev wasn’t doing anything for smb copy speeds.
Coming from a classic RAID background in IT, I support the use of RAID10 (stripe of mirrors) or equivalent. Sure, it's 50% usable capacity, but i/o is fast and resilient, plus a drive replacement takes as long as a single full disk read and write (since that is all it needs to do), meaning you get back to nominal quicker than other raid configs. Just remember that RAID (including raidz) is not a replacement for backups.
Tip: Backups are a complete waste of money, until you need them, then they are worth every single cent ten times over.
Tip2: if you want to use truenas you will want to start with at least four drives, two cheap and small ones for the truenas OS (mirrored), and two (or more) large ones for data storage. You cannot use the OS drives for data storage (this is not a bug, it is by design). You can skirt this a bit by using virtualization for the truenas OS (installed to a virtual disk) and passing the physical data drives to the truenas vm to use directly (just don't use the vm capabilities of truenas scale as nested vm's is a whole new level of WTF). Took me a long time to fully understand this, hope it saves someone on the internet some time.
Did you retest with data sizes that exceeded the memory cache?
TLDR: the average user doesn't go hard enough to notice the fact that, yes, there is a difference between reads depending on your vdev layouts.
Note that, like most of these videos, the zil/log is being confused with a write cache.
It's frustrating to see ix reps reinforce the misunderstanding.
It is not a write cache, it's a write backup. As long as the write works normally, the zil will never be read at all. Zfs write cache is in RAM (transactions groups), the zil or slog backs that up for sync write in case the ram copy is lost.
Poorly designed slog will slow your pool down, since it's NOT a write cache, you are adding another operation.
Exactly, the benefit of the SLOG is the system to reply "sync write done" as soon as this backup is written, while it will not send it as long as the data is only in RAM.
So the physical write process is not accelerated, but the overhead implemented by sync writes and the required replies is moderated by a very fast SLOG device that will allow much quicker replies to be sent.
@@Burnman83 except it is accelerated because it's been written to your fast slog device, which is just as good as writing it to the spinners. it IS effectively working as a cache also.
@@charliebrown1947 Yes it accelerates the capability to reply positive to a sync write, because writing to slog will already trigger the "ack", but it is NOT acting as a cache, because in that case it would delete the written data from RAM and then transfer from SLOG to disk. Afaik this is not what is happening. Data is kept in RAM and written from RAM to SLOG and disk in parallel. The only advantage this is the Delta between the SLOG SSDs acknowledging the write vs the disks, which means the smaller the files the more acceleration and vice versa, but no, it is still not a cache ;)
@Burnman83 once the data is written to the slog it is as good as written to the pool. It can be purged from arc and it is literally behaving as a cache. I don't know why you're trying to say it isn't. The data is cached in the slog and written to pool later.
@@charliebrown1947 ARC is RAM-based read cache, but I know what you wanted to say.
From all I know this is not the case, SLOG is never read, only written and in case of a disaster, power outage or whatever read later during the remediation. Otherwise data remains in RAM, effectively blocking it and slowing down the transfer when you are running out. That means, big SLOG is not worth it, as it will only help for bursts anyway.
If you had a piece of vendor documentation for me telling otherwise, please go ahead and prove me wrong, but until then I stick with what I learned, and not with what I think would make sense.
This is by no means meant as an offense, I'd love if you were right, would give me all the tools I need to build lightning fast SLOG and force sync on all writes and like that have permanent insane write performance, but I'm afraid this is indeed not how SLOG works.
This was extremely helpful.
I'm in the process of doing my homework to migrate my home server to TrueNAS and was planning on spending on SSD's for caching. Now I know to get more memory instead.
So happy it helped! That's for watching!
Thank you! The interview was gold!
I've got 8x 4TB drives in RAIDZ2 and 2x 500GB SSD. I need NAS mostly for Plex (movies and music) and Nextcloud/Piwigo (photo sync with iPhone). Should I use 2xSSD as ZFS L2ARC read-cache or maybe make a mirror with these SSDs and use as a pool for plugins?
I had the exact same question as you. My conclusion is that I'm almost certain you don't want to use it as L2ARC.
My reasoning:
1) Are you sure you would benefit from an L2 read cache anyway? Is your typical working dataset that is accessed repeatedly larger than what can fit in memory? If so, can you increase the RAM? It will be much faster, and the cache algorithm for data in RAM is more sophisticated than data in L2ARC.
2) you will probably kill your SSDs quickly, unless they are high endurance (enterprise grade SSD or Optane)
you don't need to mirror the l2arc. Its data is already present in the pool so if it fails you don't face any data loss risks.
I personally don't think adding l2arc will improve your performance significantly to notice, but I'm not a zfs guru.
This is generally useful when you're caching HDDs with fast NVMe SSDs, for heavy-IOPS stuff like video editing.
Is there a size limit by design for the SLOG cache? I read somewhere that bigger than 8GB storage is not used in the process.
That's not a size limit by design, but a practical limitation of real world use case, the amazing Lawrence Systems goes into mathematical details about how those numbers came about in this video: th-cam.com/video/M4DLChRXJog/w-d-xo.html
This was a interesting discussion.
I am interested in the 20-100 tb realm of storage and RAM.
Also this was focused more on the scale side of discussion and not so much on the NAS side.
Helpful as starting point of discussion but there are more questions to discuss.
Also I would like to see the tb transfers rather than the mb and gb.
Former DAS user wandering in the darkness of NAS.
5:03 : That's because a SLOG is not exactly a write cache. This is only going to speed up synchronous workloads, and mostly will benefit random writes, not sequential.
Back in my Oracle DBA + Unix sysadmin days, we had an acronym: SAME (stripe and mirror everything). It kinda still applies, especially with spinning disks. Disk is cheap, mirror everything if you're concerned about write performance and fault tolerance. Of course if your data sets can fit on SSD or NVME then do that and get on with your life (I would still mirror it though).
Hm, watching this video there are actually more questions afterwards than before =)
1. When your initial tests already almost capped out on the full 10g speed, how did the host expect the speeds to improve through caching? =)
2. Why does the iX systems sales engineer explain SLOG wrong and enforce the common misunderstanding that SLOG will act as a write cache for sync writes?!
3. The system I am running Scale on is an old Dell R730 with 512GB of RAM, lots of different variations of pools of HDDs and SSDs/NVME. The network speed is 25g. How come that through Samba I barely ever get close to the numbers 2GuysTek get in their tests here over SMB even if I tests against a pool of very potent NVME disks only and as said, 8 times as much RAM (let alone much more CPU power). Have these tests been conducted with actual real life testing, or just with some synthetic test tools that don't tell anything about the real-world performance anyway?
It'd be great to have another video where the impact of a metadata pool would be tested and also some SMB tuning that enables you to actually use these kinds of speeds in any real-world setups, rather than capping out at around 500-700Mb/s all the time due to the flaws of SMB.
All the best! =)
You'll understand if you spend the time to research and think logically instead of acting like you know everything.
@@charliebrown1947 That is funny to read, considering there are literally flow diagrams in the official documentation proofing me right and as said, tested it in lab and surprise, official documentation is correct.
Explain to me one thing: Who is the one of us that is acting like he knows everything? The guy that actually tested all this after reading the official documentation, explaining test serious that were tried and proceed his point that you can easily replicate, or the guy that managed to write 6 comments or so without any info exceeding "you better educate yourself".
@Burnman83 okay boss! You're so smart!
@@charliebrown1947 Thanks, champ. Appreciate it.
@@charliebrown1947can you stop trolling this guy? I'm interested by what he's talking but you always stop him with irrelevant bashing.
It would be good to do an updated video with the special vdev on flash. It speeds up things considerably. Write cache and read cache are nice but in my experience make less of a difference compared with having the metadata offloaded onto ssd. Still good job gathering so much data.
So, one photographer, working on small projects and regular file transfers it really doesn’t matter which raid I choose as long as I have as much as I can fit in there to speed up the transfers? Makes sense if that is accurate.
There is a very old but still relevant blog on testing up to 24 drives with different configuration performance for HDD and SSD.
Dang! Thanks Rich. I definitely got so many of my burning questions answered.
Ok so i am setting now my new truenas server just for 4k movies backup and streaming,.,
since i am having 2x 12 TB HDDs with 4k remuxes... Should i mirror then or raidz2??
Also what should be my ram?32gb?
Thanks!!
Should i mirror
Yes to a mirror - that’s the only way you can survive a single disk failure with two drives. If you had 3 I’d recommend a RAIDZ1. In terms of RAM, 32GB is probably the minimum, if you can swing more, do it. You’ll have better overall performance.
Nice video, and hearing from ixSystems directly helps re-enforce things for me.
Mirrorred vDEVs for storage using high capacity HDD is fine as long as you have one or more high end, DLP enabled SSDs in front of them as Log vDEVs. I don’t have a L2ARC because I have 128GB RAM and ARC is only consuming 62GB.
I run iSCSI for vSphere on 10GB networking and never had a problem with disk performance.
If I'm not mistaken, your ARC is 62 GB more or less by definition - by default, the ARC will occupy (up to) half your memory (this is something that can be tuned).
Question, I have 2 x 2gb nvme sticks that I am attaching via carrier boards to slimsas ports (motherboard only has 1 M.2 slot, using for OS drive, and I don't have anymore PCIe slots left) can I mirror them and then partition them, so I can use a small parts of them for discrete caches (read & write) and then the rest for SLOG? The primary use of this NAS is going to be VM hosting with a smigg of fileshare. Also, what is the suggested block size for VM hosting scenario vs a fileshare scenario?
You cannot carve up parts of disks for caches in ZFS unfortunately, only whole disks. If you're going to be running VMs, get as much RAM in your host as you can, and build out a SLOG for sync writes out of the two NVMe SSDs you've added. I think a read cache is less valuable in that scenario.
@@2GuysTek Will do thanks! Was hoping to be able to put small files reads on NVME for a specific reason but oh well. The storage disks are all SSDs anyway. I can put 64gb of RAM right now. Will see if I can boost that later. Thanks again...
A pair of SATA SSDs more than enough for OS; no need to waste NVME on OS.
my debian desktop has a raidZ1 I use to mount my home directory... Definitely going to reconfigure it with mirrored pairs and throw in a log ssd for it the next time I change it up. Is it finally time to nuke my windows 11 install for good? :O
I threw 2 old NVMe SSDs into my server as read cache but performance didn't really go up in a noticeable way.
Now I know why, wrong workloads (lots of small reads but very few ones bigger than my RAM).
Wow! That was EXCELLENT!
Sadly they didn’t talk about special vdev…
Great insight, thanks Chris!
Thanks both for explaining ZFS, really a great video.
Happy it helped!
The explanation taught me alot!
What about Special device/cache attached to a Pool?
Wow! Such an informative video! Thank you so much!
is it possible to do one on encryption? vdev level encryption vs dataset encryption vs 2 layers of encryption. I feel like it will be relevant to many people
would have been nice to put a single mirrored pair up against the 4x pairs to see how well they scale
This was a very informative video. Kudos for sharing
The other factor I’d like to hear more about with vdev design decisions is future expansion potential. Home users may start small (2-4 disks) then add more over time. Are mirrors really the only way to go to expand an existing pool? Aside from potentially creating an entirely new pool, which has other trade offs.
In 2021 there was talk about OpenZFS adding single-disk add functionality to an existing RAIDZ VDEVs, however from everything I've read it sounded kinda kludgy and not very easy. I'm not sure where they're at with it today, I'll dig into that. That being said, you can expand a pool with RAIDZ(1,2,3) VDEVs by adding another VDEV to the pool. This will expand your pool size and give you better performance as well. It's still not an individual disk add, you'd need to have at least 3 disks (or more depending on RAIDZ type) to build another RAIDZ VDEV. So at least you're not locked into a Mirror VDEV only for expansion which is better on parity-cost.
To add more to this. I just heard back that the single-disk add feature is still not available, so your only way to expand an existing pool is to add additional VDEVs to it.
@@2GuysTekor change every drive to a bigger one
I just built my Trunas with 4x HDD and 2x SSD and I went with the pool of 2 mirrors (HDD) and just mirror (SSD) as I read somewhere about this that mirrors are the best. So to speed up day to day tasks I use fast SSD pool and for backup and media consumption the slow HDD pool. 64GB of RAM seems to be more than ok to support 28TB of my total storage
The only thing about mirrors is that if they are 2-disk mirrors, 2 disks failing in one mirror vdev will kill the whole pool.
so what is z1 best for?
10:44 "mostly it's coming down to the performance you're after for your workload."
Ttttthhhhhhhhaaaaaannnnnnkkkkkkk you! I have been shouting my damn lungs out for nearly a decade now. Cache in memory is meant to act as a way to not have to reach down into the disk. This was really irksome when people were like, "You need 1GB of memory per 1TB of total storage on your ZFS pool!" No. You don't. That's dumb. Closer to the truth would be you need 1GB of memory per 1TB of DATA in your pool. Because it takes exactly 0MB of memory to track empty storage. A better metric would be, you need more memory if your ZFS ARC hit ratio drops below about 70% regularly, and the performance hit is starting to irritate you.
One thing I don't like is how people, even people in the know, talk about the SLOG. It's not a write cache, it's a secondary ZFS intent log. The ZFS intent log (ZIL) exists in every pool, typically on each data disk. It's essentially the journal in every other journaling file system. Before you write data to the disk, you write that you're starting an operation, then you write the data, then you tell the journal that you've written the data. ZFS does the same thing, though it actually writes the data to the ZIL as well.
When people talk about caching writes, they're usually thinking about something like battery backed storage on RAID controllers. ZFS will never do this exact thing. When you add a SLOG to a pool, you're ZIL is basically moving over there. This takes IO pressure off of your spinning rust data drives, and puts it on another drive. To that end, if you have a pool of spinning rust drives, and you add another spinning rust drive as the SLOG, you will see write performance increase, just not the massive increases that you might expect from a typical write cache. Your spinning rust data drives will absolutely thank you in the long run.
Good video. Kindly share the test results for me.
ARC, Sync TGXs, SLOG, and L2; the rabbit hole is very deep. I've spent a lot of time understanding how all this works. I wonder how many hours I've sat watching `zpool iostat -qly 10` trying to actually understand my workload.
Great video!
This is some great info. Awesome vid.
Why would you test over smb?
It's a NAS, and its job is to push data over a network, and for the majority of the world, that's done via SMB/CIFS.
Is RaidZ expansion into TrueNAS now?
I built out my truenas system with my old pc lol
I have an amd 5950x, 128GB of ecc ddr4, and a pair of lsi 9305-16i cards hosting 28x h550 20TB drives.
I have a 990pro 1tb for boot.
And a pair of 118gb optane in mirror for slog.
It's tied to the network with dual 10gbps links i use it for VM's but mostly for plex
I have a 8 disk pool with 2 4-disk Z1 Vdevs. Good balance of performance and ability to lose a disk. Offsite backup to make up for the risk of not running just a Z2 vdev. And because raid isn’t a backup.
I seen around 4 video just like this one. Why are you not showing us how to set it up?
Excellent video, thank you!
Great video, thanks. But one correction. You keep referring to a “write cache” when talking about the SLOG. But it’s really a ZIL (ZFS Intent Log) and doesn’t work the way you might think that a “write cache” might. Chris glossed over it. Going into that in more detail would be useful because this makes a big difference when sizing for the ZIL, which is mostly based on your network throughput.
What about metadata??????
It seems to me that if you're trying to compare purely the relative raw performance of the different layouts, that you would want to run the benchmark program LOCALLY on the NAS. Something like 'iozone' would be one such choice. Measuring the over-the-network performance would be a separate analysis.
Great discussion!
Are the results similar because they are hitting the l1 arc (ram)? edit: hah should have kept watching!
I found that mirrors are way slower than dRAID with multiple vdevs. I have a 40 mirror SSD pool that runs at less than half the speed of my 60 HDD 4x dRAID pool in another enclosure with 3 SAS expanders going into one SAS controller maxing out at 4.8GB/s after overhead. One helper for my mirrors was going from 128K to 1M recordsize. But mirrors were still slower! I also disabled ZFS cache when running tests. dRAID is incredible with as little as 2 redundancy groups.
I guess caring about performance is important for work flow, as a home user I have been running a 2 HDD mirror and no cache for a few years now and I don't it to be a big deal, but yet again my largest file is 15MB
That's fair - It's all about use case and what you're looking to get out of your gear.
Spread sheet please.
fantastic!!! thank you 😙
...And you'd lose that bet. ZFS in production one week after it came out in Solaris 10 6/06 (u2) here.
Great info. Thanks.
So helpful.. thanks 🙏
My poor man's usage of OpenZFS 2.1.5 runs on a minimal install of Ubuntu 22.04 LTS on a Ryzen 3 2200G; 16GB; 512GB nvme; 128GB SSD; 2TB HDD.
All my application run in say 6 VMs of which Xubuntu with the communication stuff is loaded always (Email; Whatsie etc). I have 3 datapools:
- one with the 11 most used VMs on the nvme-SSD (3400/2300MB/s), running with primarycache=metadata. Boot times of e.g Xubuntu are with caching ~6,5 seconds, while without caching it takes ~8 seconds. I wait ~1.5 seconds more, if I can save say 3GB of memory.
- one with 60 more VMs on the first faster partition of the HDD. Here I have 2 levels of caching
How is the wear on those SSD and NVMe drives?
thanks for the video!
The Log VDEV doesn't exactly function as you describe here. Chris touches on this a bit, but there's an extra little caveat to how the Log VDEV works:
Async writes are cached in RAM and not written directly to disk by default. But sync must be committed to disk before being considered complete. A power failure or crash will result in the RAM write cache being lost. That's generally fine for async writes, but for sync writes (like Chris says, something like a database or virtual machine, etc.), lost writes could really screw things up and corrupt your data. The Log VDEV fills this gap by providing non volatile storage of the write intent log. Without it, sync writes have to wait on the spinning disks
Fair point. It would also be fair to say that if you're using ZFS, it would be smart to have a battery backup to protect against that situation regardless though. It's my understanding that even with a Log device, ZFS is going to use the RAM _first_ which exposes you to the same issue. Would you agree?
@@2GuysTek It helps to understand the exact steps the OS takes for sync writes:
1) An application of some kind requests to write data synchronously to ZFS
2) The data sent by the application is stored in RAM to be written to disk in a transaction
3) The data sent by the application is written to the *intent log* which exists on non-volatile storage (regular disks, not RAM)
4) The application is informed that the data it has requested to be written has been successfully saved
5) The transaction of writes in RAM is successfully written to the storage pool (this is the thing that "happens every 5 seconds" but not really that Chris talks about)
6) Now that the data is safely in the pool, the copy of the data we made in step 3 is deleted from the intent log
So yes, RAM first. The only time the intent log is actually read from is if a power loss or crash happens sometime after step 4 and before step 5 is finished.
It's important to remember, the intent log ALWAYS exists. When you set up a Log VDEV you're just telling ZFS specifically where to put that data. Without a Log VDEV, it just lives on your regular storage VDEVs.
A few things to keep in mind when choosing disks for your Log VDEV:
1) You don't need a ton of storage. You're committing your writes to your disk every few seconds. So your log really only needs enough storage to hold a few seconds worth of data.
2) You really don't need a ton of storage. Even ignoring #1, remember we cache in RAM first. You can't cache more than that. A Log VDEV that is larger than your RAM is wasted space.
2) Most SSDs cache their writes in their own RAM inside of the SSD itself. It is possible for a power failure at the exact moment where ZFS thinks the data is safe but the data is only in the RAM cache of the SSD. Always use enterprise SSDs that have *power loss protection* for your Log VDEV.
It's too bad Intel killed off Optane, because it is the ideal log drive. They're incredibly low latency and a lot of them write directly to the flash cells (no SSD RAM). In fact, most of it is super marked down right now if you want to pick some up for later use.
@@biohazrd Good points, but I would like to further clarify about the size of Log VDEV in case of flash based media. Although it seems to be a waste using large SSDs for Log VDEV, that is not necessarily true in all cases. In a heavy write environments, it's wise to use larger SSDs or drives from a different series/manufacturers. Flash wear out (TBW) can kill smaller drives from the same series both fast and theoretically near simultaneously. So don't save on Log drive capacity.
@@miroslavstevic2036 Yeah, good point. I didn't think about over provisioning but you can really stretch your endurance with a larger SSD in that scenario.
@@biohazrd Also, smaller drives these days are not very good in terms of gb/$. It's like, do I pay $40 for 512GB or $50 for 1TB? I might as well just get double the TBW for $10 more.
Wow, great information.
With all the idiots that publishing videos about truenas. Im glad someone is making good informational videos
I just got four 4TB wd red SSDs imma just run Raidz1 i guess should be fine
That'll be fine.
@@2GuysTek thanks!
discord link in the description is no good
Good catch! Fixed and here: discord.gg/5TcfBWBB7S
Chris did an excellent job of explaining how ZFS works. Rich, great job getting him on a call for all of us to hear a proper and thorough explanation from an expert! excellent video. At 10 mins and 20 seconds in, masterful answer! TrueNAS is a fantastic storage product.
Actually, he really didn't. Didn't explain caching correctly. I expected better
basically i cant do anything lol i only have 3 14tb hdds hardrives and like 5 ssds that are 1 tb i dont what i should do still lol
you also have to verify you aren't testing with highly compressionabler data.... you can write 500 TB of zeros with only writing only 500MB/s to disk
You can have more than 1 L2ARC per pool
There's also draid now which is an abstraction of raidz with hot spares. I use it for my VMS and lxcs since resilvering time is much much faster and the performance is adequate for my uses. Hot spare capacity is distributed throughout the vdev so on a resilver data is read from all drives and written to all drives simultaneously instead of all drives mobbing the hot spare being resilvered.
My understanding is that dRAID isn't available until the future version of SCALE comes out. We'll certainly be evaluating the new RAID type when it comes to production!
Perfect demonstration how little a random TH-camr understand about ZFS and TrueNAS. You just fail at the basic ZFS understanding exam.
Next time read the doc is clear enough.
🤷🏿♀️🤷🏿♀️🤷🏿♀️ 🤦🏿♀️🤦🏿♀️🤦🏿♀️
ME FIRST !!!! Watching > Coffee In Hand !
Chris was great
Before watching the video I'm going to go ahead and say mirrored VDEVS
Then come back and see if I'm right 😂
...And?
Testing this over SMB ruined any real performance testing. It has huge unpredictable overhead. You should have used something that will not warp the test results like NFS.
you cant compare performance of fresh pools.
I disagree, however I also understand where you're coming from in terms of ARC. I think it's completely fair to test a fresh pool's performance because that's a normal state of a NAS in it's functional life. Saying the only way to test performance is on existing, cached data, isn't correct either. Maybe a compromise is to run perf tests on fresh and warm data instead.
@2GuysTek I'm not talking about arc or cache. I'm speaking to the actual use case of a nas which is not a brand new empty pool.
News Flash, you did the tests WRONG as well...... Sequential is NOT how you figure out IOPS. You need to test with random reads/writes. SEQUENTIAL would be good for backups/restores. Try running some VM's and really get ideas about how caching works. Also if you are using spinning rust, I would ALWAYS recommend READ and WRITE caches if you can afford it.
I really wished youtubers who show their face and background all dark-friendly wouldn't jump to fullscreen glaring-white in a split second. Let's be realistic about when and how us degenerates actually watch stuff.
The definition of this video almost makes me unconfortable. lol It's soo clear I can see every single beard stubble and chest hair.
No you cannot combine different VDEV types in a single pool tho
Sure you can! Try this:
cd; mkdir zfstest; truncate -s 100M disk{1..5}; zpool create mixedpool raidz /root/zfstest/disk{1..3} mirror /root/zfstest/disk{4,5}
It works just fine.
@@exscape Thanks, can mix and should mix is very different then :P
Shame he doesn't understand how a SLOG works (not thats thats unusual). A SLOG is NOT a cache. Its always written to and never read from (in a steady state)
Tom from Lawrence systems did a video on it, so just you didn't know the best performance. It's ok, you've only been using truNAS for....errrr.... Years 😢
that dude from ix is wrong... it does not matter what speed you have on the transfer drive.. the only speed that matters is the drive you are writing to.. if the drive you are writing to is 5400 for speed and the drive you are writing from is 7200 it will slow down..just as the tests just showed... the tests prove the ix guy is lying... cache drives do nothing to speed the write process because the drive you are writing to is slower... all transfers slow down when you are writing to a slower drive.. ix needs to remove the cache drive options as it just wastes time and money.... using the ram as a pretend cache drive just burns out your ram faster...
Stopped watching once this dude said a SLOG is a write cache. And his explanation was even worse, totally inaccurate
I never have anything wrong, noob.
What an Ad….
So....the TL;DR is if you do ALL of that (you have a server that does everything), then you're effectively "screwed" in the sense that there is NO optimal configuration for you because ANY configuration that you will deploy will be the "less-than-optimal" configuration for the different types of things that you are doing with the system that does everything.
No. In my opinion the TL;DR is, generally speaking, to add as much RAM as you can to your host. Adding more RAM will give you the most noticeable improvement in performance over any VDEV layout in a 'does everything' use case. I think that knowing the 'easy button' is to add more RAM and go with a RAIDZ2 for a general purpose NAS takes a lot of the confusion over what VDEV config you should use.
@@2GuysTek
Two things:
1) Adding more RAM isn't a vdev configuration (per the title of your video).
2) re: using a raidz2 layout -- and whilst that might be the overall "average" layout that you can deploy for a "does everything" case, but as your own data shows (and also based on your discussion with Chris from iXSystems), different use cases have different recommend vdev layouts. But if you use one layout for a "does everything" system, then it is NOT the optimal vdev layout for the different use cases that a "does everything" will need to serve up.
i.e. optimal is "here", and the actual deployed vdev performance is "here" -- at a less-than-optimal level of performance, for that use case.
In other words, no single workload "wins", and they ALL lose some (level of performance), and that is the optimised solution where there are no clear winner, and everybody loses some (performance).
(i.e. there isn't a vdev layout that's a clear winner for the different use cases, in a "does everything" system.)
The optimal thing to do would be to have multiple pools/VDEV for your workloads. Whether someone who is using a ZFS 'do it all' server has the resources and wants or actually needs to do that is a totally different matter but the implication here seems to be that you are limited to one VDEV type on a server which isn't the case.
@@nadtz
That will depend on the capacity requirements, how much capacity you are willing and/or are able to sacrifice for redundancy/fault protection, etc. as a function of cost.
There is ALWAYS the perfect "scientist" solution where if money wasn't an object, you can deploy the theorectically perfect solution.
But if you had a budget of only $1000, but there aren't any changes to the statment of requirements, what you're going to end up deploying, based on that fixed, finite budget amount, will be very different than your theorectical ideal solution.
@@ewenchan1239 Obviously. The point is you said
"...you're effectively "screwed" in the sense that there is NO optimal configuration..."
and this is not true, 'scientist' or not. I clearly stated that having the resources, wanting or needing to deploy the optimal configuration is different from the fact that it is possible so you are just reiterating what I already said.
Great video, thank you!