Would having a SLOG and a cache device still make sense when having a special vdev in the pool? I'm building a compact storage server that fits 64 GB of RAM, 2 NVMes and 4 HDDs. I could imagine partitioning the NVMe drives equally so I have everything mirrored plus a striped cache. Would that be useful? What is a good way to measure this?
man, I have heard about 45 drives from lawrence systems channel, but have never found your channel until today. I am pretty basic with ZFS, but have been using since Freenas in 2017 or 2018. I found your channel as I am getting into large video storage pools now, at 38tb Z2 SSD 4tb drives and looking into metadata /special drives to ZFS. I am going to be going through your channel tonight :) my huge question is with adding a metadata drive in Truenas Core current, can it be done, is it safe, I would reuse old WD870 1tb NVMEs in mirror , is this safe for my data....Is this something has to be done from build of pool or is safe to add to a current huge pool, will a metadata failure cause a huge dataloss on my drives causing a rebuild from rust>? I know I am way behind (over 2 yrs), Love ZFS, and expanding and learning !
It would. But ARC evicts stuff all the time, so your desired metadata may not stay there. Tuning parameters can help with this. But having metadata on SSD/NVMe guarantees fast access. And the vdev increases pool capacity, so it's not "wasted" space. Worth considering if you have spare capacity for 2xSSD/NVMe. And you really need it on very large pools or when handing out a lot of zvols (block storage).
actually document management servers actually frequently have over 100 K files in one directory. massive workload do exist. a human may not do directly by but applications frequently do
A temperature tracking software (food transport industry) does this as well. Devs do crap like this all the time, where stuff clearly belongs in a DB or sth 😅
Im curious does this benefit iscsi luns and vm disks. Say i want to use truenas as a storage target iscsi for windows vms and i would also like to use a SR (storage repo) for vm disks to live on.
Metadata doesn't (only) mean file metadata, in this case. Zvols also consist of metadata nodes and data nodes, and the metadata nodes do get stored on the special vdev, as well. However, you'll likely see acceleration to a lesser degree than with regular datasets. Though, I read somewhere that you may be able to use file based extents for iSCSI, which means dataset rules would apply.
@@TheChadXperience909 from what I remember you can use file extents for iscsi on truenas but I vagally recall hearing that some of the iscsi benefits are lost when not using zvols.
@@shittubes We actually don't use ZVOL's for iSCSI LUNs for a few reasons. We have found much better success deploying fileIO based LUNs that we create within the ZFS dataset. I believe one of my videos here goes over this, but perhaps its time for a good refresher on ZFS iSCSI.
Do you usually run some performance metrics on your customers' machines once they have been built out? Feel like you could easily let the same tools run in the background to generate some exemplary "load at 10 am might be this", which should easily show the differences. For StarWind vSAN I used DiskSPD, which seems to have a linux-port Git repo (YT doesn't like links, it's the first result in Google).
the metadata vdev houses the data about the data. So things like properties, indexes, etc. essentially pointers to where the data is in the pool/dataset. If you remove that it's like the data has nothing to tie it to a specific block(s) in the pool rendering all data inaccessible So no, not safe to remove.
You would loose the data in your pool. So yes, I'd recommend to mirror your metadata device. Also make sure that it is sized correctly corresponding to your data vdevs size.
Thank you for the explanation, you mentioned that the metadata is stored on the disks and having an NVME will help speed that up. Would you recommend adding one for an all-flash storage pool?
As he said it will keep the load from the storage disks (or flash). Depending on the workload it also could improve performance for an all flash storage
M.2 NVME drives have lower latencies than drives connected via SATA, and often have faster read/write throughput. It would accelerate such an array, but to a lesser extent. When comparing, you should look at their IOPS.
it can create higher fairness between multiple applications with different access patterns. so that a high throughput sequential write load won't affect another workload that does mostly very small I/O, either just on metadata or using small blocksizes (handled by the special device).
It depends on SSD types. If you add a NVMe drive for all-flash SATA pool, the benifits will not as noticeable as accelarate a HDD only pool. The iops difference between NVMe drives and SATA drives is not that significant. The 4k performance of a SSD is not only related to its interface, model of NVMe controller, NAND type, and cache speed and size also play a big role to influence the final performance of a SSD drive.
Hi, what capacity HDD's and NVME were used for the video, I'm terrible at reading Linux's storage capacity counters. I'm trying to work out a good capacity of NVME to get for my 32TB (raw) pool, is 500GB a good amount?
We used 16TB HDDs and a 1.8TB NVMe. How much metadata stored will vary depending on how many files are in the pool not only how big it is. 32 TB of tiny files will use more metadata space than 32TB of larger files. So, its not always straightforward to pick the size of the special vdev needed. Okay, so where to go from here? Rule of thumb seems to be about 0.3% of the pool size for a typical workload. This is from Wendell at Level1Tech - a very trusted ZFS guru. See this as reference: forum.level1techs.com/t/zfs-metadata-special-device-z/159954 So, in your case 0.3% of 32TB would be 96GB. Therefore, 512GB NVMe will work. Remember, you will want to at least 2x mirror this drive and buy enterprise NVMe, as you will want power loss protection. If you already have data on the pool, you can get the total amount of metadata currently being used, using a tool called 'zdb'. Check out this thread as a reference: old.reddit.com/r/zfs/comments/sbxnkw/how_does_one_query_the_metadata_size_of_an/ You can do this by following the steps in the above thread or you can use a script we put together inspired by the thread: scripts.45drives.com/get_zpool_metadata.sh Usage "bash get_zpool_metadata.sh poolname" Thanks for the question, hope this helps!
@@steveo6023 It speeds up, because flash storage is faster at small random IOPS than HDDs. Even though they are small reads/writes, they add up over time. Also, it prevents the read/write head inside the HDD from thrashing around as much, which reduces seek latency, and can also benefit drive longevity.
@@steveo6023 If spinning drives can spend 99% of their time in sequential writes, they will be very fast. If e.g. 50% of the time is spent in random writes for metadata, the transfer speed will be halved. if the nvme metadata handling doesn't add other unexpected delays (which I do not know, am only wondering if it's the case), this could be completely predictable in this linear way.
That's why you should always add it in mirrors, which also has the effect of nearly doubling read speeds, since it can read from both mirrors. Presenter is using mirrors in his example.
Would having a SLOG and a cache device still make sense when having a special vdev in the pool?
I'm building a compact storage server that fits 64 GB of RAM, 2 NVMes and 4 HDDs. I could imagine partitioning the NVMe drives equally so I have everything mirrored plus a striped cache. Would that be useful? What is a good way to measure this?
man, I have heard about 45 drives from lawrence systems channel, but have never found your channel until today. I am pretty basic with ZFS, but have been using since Freenas in 2017 or 2018. I found your channel as I am getting into large video storage pools now, at 38tb Z2 SSD 4tb drives and looking into metadata /special drives to ZFS. I am going to be going through your channel tonight :) my huge question is with adding a metadata drive in Truenas Core current, can it be done, is it safe, I would reuse old WD870 1tb NVMEs in mirror , is this safe for my data....Is this something has to be done from build of pool or is safe to add to a current huge pool, will a metadata failure cause a huge dataloss on my drives causing a rebuild from rust>?
I know I am way behind (over 2 yrs), Love ZFS, and expanding and learning !
Great vid but won't 'hot' metadata live in your ARC (RAM) anyway and therefore that is surely the fastest place to have it?
It would. But ARC evicts stuff all the time, so your desired metadata may not stay there. Tuning parameters can help with this. But having metadata on SSD/NVMe guarantees fast access. And the vdev increases pool capacity, so it's not "wasted" space. Worth considering if you have spare capacity for 2xSSD/NVMe. And you really need it on very large pools or when handing out a lot of zvols (block storage).
actually document management servers actually frequently have over 100 K files in one directory. massive workload do exist. a human may not do directly by but applications frequently do
I'm sure it would benefit email servers, as well.
A temperature tracking software (food transport industry) does this as well.
Devs do crap like this all the time, where stuff clearly belongs in a DB or sth 😅
is it good to have meta disk even if we use ZFS primarily as virtualisation target ?
Im curious does this benefit iscsi luns and vm disks. Say i want to use truenas as a storage target iscsi for windows vms and i would also like to use a SR (storage repo) for vm disks to live on.
it's only useful for datasets, not usable for zvol
Metadata doesn't (only) mean file metadata, in this case. Zvols also consist of metadata nodes and data nodes, and the metadata nodes do get stored on the special vdev, as well. However, you'll likely see acceleration to a lesser degree than with regular datasets. Though, I read somewhere that you may be able to use file based extents for iSCSI, which means dataset rules would apply.
@@TheChadXperience909 from what I remember you can use file extents for iscsi on truenas but I vagally recall hearing that some of the iscsi benefits are lost when not using zvols.
@@cyberpunk9487 Makes sense.
@@shittubes We actually don't use ZVOL's for iSCSI LUNs for a few reasons. We have found much better success deploying fileIO based LUNs that we create within the ZFS dataset. I believe one of my videos here goes over this, but perhaps its time for a good refresher on ZFS iSCSI.
🔥🔥🔥
Do you usually run some performance metrics on your customers' machines once they have been built out?
Feel like you could easily let the same tools run in the background to generate some exemplary "load at 10 am might be this", which should easily show the differences.
For StarWind vSAN I used DiskSPD, which seems to have a linux-port Git repo (YT doesn't like links, it's the first result in Google).
If you add a metadata vdev to a pool, is it safe to remove later? Is this a cache or is no metadata going to disk anymore?
the metadata vdev houses the data about the data. So things like properties, indexes, etc. essentially pointers to where the data is in the pool/dataset.
If you remove that it's like the data has nothing to tie it to a specific block(s) in the pool rendering all data inaccessible
So no, not safe to remove.
Great demo. What are the risks? Do we need to mirror the special device? What happens if it dies?
You would loose the data in your pool. So yes, I'd recommend to mirror your metadata device. Also make sure that it is sized correctly corresponding to your data vdevs size.
Thank you for the explanation, you mentioned that the metadata is stored on the disks and having an NVME will help speed that up. Would you recommend adding one for an all-flash storage pool?
As he said it will keep the load from the storage disks (or flash). Depending on the workload it also could improve performance for an all flash storage
M.2 NVME drives have lower latencies than drives connected via SATA, and often have faster read/write throughput. It would accelerate such an array, but to a lesser extent. When comparing, you should look at their IOPS.
it can create higher fairness between multiple applications with different access patterns. so that a high throughput sequential write load won't affect another workload that does mostly very small I/O, either just on metadata or using small blocksizes (handled by the special device).
It depends on SSD types. If you add a NVMe drive for all-flash SATA pool, the benifits will not as noticeable as accelarate a HDD only pool. The iops difference between NVMe drives and SATA drives is not that significant. The 4k performance of a SSD is not only related to its interface, model of NVMe controller, NAND type, and cache speed and size also play a big role to influence the final performance of a SSD drive.
Hi, what capacity HDD's and NVME were used for the video, I'm terrible at reading Linux's storage capacity counters. I'm trying to work out a good capacity of NVME to get for my 32TB (raw) pool, is 500GB a good amount?
We used 16TB HDDs and a 1.8TB NVMe.
How much metadata stored will vary depending on how many files are in the pool not only how big it is. 32 TB of tiny files will use more metadata space than 32TB of larger files. So, its not always straightforward to pick the size of the special vdev needed.
Okay, so where to go from here?
Rule of thumb seems to be about 0.3% of the pool size for a typical workload. This is from Wendell at Level1Tech - a very trusted ZFS guru. See this as reference: forum.level1techs.com/t/zfs-metadata-special-device-z/159954
So, in your case 0.3% of 32TB would be 96GB. Therefore, 512GB NVMe will work. Remember, you will want to at least 2x mirror this drive and buy enterprise NVMe, as you will want power loss protection.
If you already have data on the pool, you can get the total amount of metadata currently being used, using a tool called 'zdb'. Check out this thread as a reference: old.reddit.com/r/zfs/comments/sbxnkw/how_does_one_query_the_metadata_size_of_an/
You can do this by following the steps in the above thread or you can use a script we put together inspired by the thread: scripts.45drives.com/get_zpool_metadata.sh
Usage "bash get_zpool_metadata.sh poolname"
Thanks for the question, hope this helps!
@@45Drives Thanks for the reply. All of that info will be very useful and I'll be reading those threads you linked in a minute.
Stinks that you lose the whole pool if the mirror dies. I'd want to also z2 the special pool
In my experience, it really accelerates file transfers. Especially, when doing large backups of entire drives and file systems.
How can this improve transfer speed when only metadata are on the nvme?
@@steveo6023 It speeds up, because flash storage is faster at small random IOPS than HDDs. Even though they are small reads/writes, they add up over time. Also, it prevents the read/write head inside the HDD from thrashing around as much, which reduces seek latency, and can also benefit drive longevity.
@@TheChadXperience909 but metadata is cached in the arc anyway
@@steveo6023 That applies only to reads, and always depends.
@@steveo6023 If spinning drives can spend 99% of their time in sequential writes, they will be very fast. If e.g. 50% of the time is spent in random writes for metadata, the transfer speed will be halved. if the nvme metadata handling doesn't add other unexpected delays (which I do not know, am only wondering if it's the case), this could be completely predictable in this linear way.
Unfortunately it will add a single point of failure when using only one name device as all data will be gone when the metadata ssd dies
That's why you should always add it in mirrors, which also has the effect of nearly doubling read speeds, since it can read from both mirrors. Presenter is using mirrors in his example.
This is a lab env, as stated.
You wouldn't run this exact setup prod for various reasons 😅