I learned how HA works from this video, thanks! One note - when you're migrating the VM on NFS or CEPH, you're only moving the contents of RAM from one machine to another. You're not testing the performance of NFS vs. CEPH.
Thank you very much for this. When I first heard about CEPH and ZFS I looked on TH-cam for tutorials, but I didn't find much that goes over a step by step like yours. Yours is the best I've see thus far that goes over Local, ZFS, NFS, CEPH in a Proxmox Home Server.
Thank you so much for your video. You really covered the topic perfectly for an introduction. You started with foundation concepts and built upon those to show us how Clusters, HA, and the different storage types work. I got more value from your video in 45 minutes than I have from days of reading forum posts and watching the scattered topics others have produced. I'm still looking for a very clear walk-through of the Proxmox networking options. What I've seen so far is too simple or too advanced. I need that ground-up approach like you've taken here. Thanks again for your great work!
Thanks for making this video available, it has a good pace which makes it easier to understand what can be a difficult subject. Please keep recording them as I'm sure your audience will only grow,
Fantastic video, followed it to set up my Proxmox and Ceph cluster! I should have watched it to the end first but couldn't help myself. My problem was needing additional drives to install Ceph, but I was able to do that fairly easily and resolved the lack of hardware with USB SSD's. Not perfect, as my 3 Proxmox servers were built with micro form-factor NUC style devices. 👍
I am glad all worked out for you. Just please note, USB storage does work but Read/Write will kill USB storage using faster compared to using a HDD/SSD storage.
Hi, I watched several videos on this topic, I agree with the previous speakers, you did a hard and very good job. Great respect, thanks from me. With your help, my homeassistant migrates between servers. From ZFS local storage, next for further experiments.
Awesome video, thanks a lot. I’ve been putting off redoing my 3 node cluster, because I didn’t have my head wrapped around Ceph yet, but I think I’m ready to try it out on a test cluster. Wish me luck. 😂
Good Luck. At first i was the same ... scared to even touch that option. Once i tried to setup couple times, turns out it is not that hard to setup, it is just the order you need to to things and once all setup just leave it. it does almost everything by it self.
Excellent video! I've been using Proxmox for a year with NFS, and a couple of days ago I started using Ceph. I watched several videos about Ceph, and yours was the clearest, even though you covered it quickly. New subscriber here!
Great video! And VERY timely as well as I just bought three mini PCs to do PRECISELY this. Just got Ceph up and running today as well. Haven't started putting VMs nor conatiners on the cluster yet -- that'll be next. Thank you!
Another brilliantly explained video that gives a great overview of all the storage options for the Proxmox home dabbler. My key takeaway's are: 1) that running VM's on a shared storage platform like a NAS has multiple issues like single point of failure and network traffic. 2) that CEPH is quite 'doable' with a bit of careful thought - and you have certainly laid out the process perfectly for us to follow. My only concern with CEPH is a comment I recall from another channel (Jim?), that CEPH is very 'hard' on the drives and 'consumer' kit may suffer. Only time will tell :)
I am happy that you enjoyed my video. And yes. CEPH tanks hardware. I had SSD Ceph pool before and i lost one of the 2TB SSD drives. It started to drop data and sometimes fail to read or write. Now my CEPH pool contains 3x 1TB NVME drives and nodes are connected via 2.5Gbps. So far so good. I still keep an eye on my drives performance.
Very well produced tutorial on various storage options for Proxmox and presented in a straight forward style that is direct to the point and is very easy to absorb and understand. Well Done!
I like the fact you went into great details and depth on something that is usually glossed over by other content creator. Particularly on the choice of options for storage meant to support HA. I used to work with Ceph, but... using SSDs increased wearout dramatically. I think I'll switch to ZFS over NFS thanks to your video.
Your effort producing this excellent video is very much appreciated. I'm sure you have set up CephFS by now as it allows for storing iso and container images available from each node, like NFS but without the single point of failure.
Terrific video . I'm about to create a cluster to do HA and all the links talk about shared storage. which is a single point of failure. Another link discussed ZFS Replication but that didn't interest me. Then I stumbled onto this link and you show how Ceph isn't as involved as I thought. Thank you for taking the time to create this comparison.
Mr P thanks, I have been devouring your tutorial vids. You have the best quality tutorials- please consider making a few proxmox cluster full setup for beginners, as there are so many ways to get lost. Regardless of what you produce next, thanks
Hi, Thank you for your comment. I have started to write up script for Proxmox Cluster setup "From 0 to Hero" It is taking some time but one thing is for sure - it will happen.
Excellent explanation! you’re a very good teacher. Thanks to you. I have so far 2 nodes up and working on third and and shared storage. Is it OK to ask questions here? I have one about adding local storage to a node, primarily the primary node. I wanted to add a NVME local drive to store and run the VM’s from and I’m not sure of the correct file system-to use. Currently I am running a SATA DOM which is 64g basically supermicro, solid-state drive and that holds the prox OS and the running VM’s, I also have installed two 500 GB Samsung solid-state drives which I have yet not configured not sure what to use for. Thank you very much!!!
I have 3 cheap N100 mini PCs. I added 256GB SATA SSDs to install Proxmox OS, and then I'm using the 500GB NVMEs that came installed with the mini PC to be the Ceph pool drives since they should be faster than SATA SSD. All important VMs & LXC containers will be installed in the Ceph pool with HA, and when I'm simply testing, I can use the local storage left over from the Proxmox install on the SATA SSD.
Great video!! At 13:26 you mention to not check the box of "add storage" while creating ZFS pool on node 2 & 3. Does that use the disk from node 1 to create the ZFS on nodes 2 & 3? Or you have similar size disks on each node?
The best video ever on proxmox. That being said there is one question every Proxmox user avoids. Is Proxmox a solution to use in Production or it is just for hobby?
You can 100% use it for production. Difference between FREE repository and PAID is that PAID repository packages getting validated by Proxmox team before available for update. With PAID version if something goes wrong with your Proxmox host, Proxmox support team will help you out to get things sorted where FREE version you are on your own to find solution / fix. So far i have helped to deploy 5+ Proxmox setups where they being used in "production" environment.
Great video, thank you very much for this! I have a question regarding HA Cluster and CEPH. In the best-case scenario, if I want to directly connect the 3 nodes without switches in between, should I use 2 NICs for high availability (node to node) and another 2 NICs for node-to-node CEPH connections? Would it require a total of 4 NICs per node just to interconnect each node separately for HA and CEPH, or can I use just 2 NICs and combine CEPH with HA? I hope I have made myself clear on my question.
great video! - but - ... the only thing what would make it better (maybe in another video) is: how to create zvol (volumes on local zfs) and use them as ceph storage. That's what i am looking for. Have you ever done such thing? I want faster response than external drives but got only one m.2 ssd slot and no other interfaces.
Nice presentation with thorough examples. Kudos on the upcoming future video, troubleshooting ceph !!!! One of the most usable troubleshoot cases would be removing a node for fixing (so entering back up the cluster) and one for total removal. At 34:42 you could explain a little better about the cache drive in case of extra ssd in the system, cause you mentioned it and just rounded mouse between DB Disk and WAL Disk without explaining what they are and do. At 37:57 you edited your video explaining why you used on your production system 32 (because ceph consists of 3x2Tb). That is not a reason. Is there a formula which you didnt mention? Even if this number has teh ability to autoscale and won t cause any issues, it would be good to know the equation upon which it is based.
Hi, I don't use them and as i don't have enough understanding about them - i am not in a position to teach over. Here is a paragraph from CEPH Documentation: A write-ahead log (WAL) device (identified as block.wal in the data directory) can be used to separate out BlueStore’s internal journal or write-ahead log. Using a WAL device is advantageous only if the WAL device is faster than the primary device (for example, if the WAL device is an SSD and the primary device is an HDD). A DB device (identified as block.db in the data directory) can be used to store BlueStore’s internal metadata. BlueStore (or more precisely, the embedded RocksDB) will put as much metadata as it can on the DB device in order to improve performance. If the DB device becomes full, metadata will spill back onto the primary device (where it would have been located in the absence of the DB device). Again, it is advantageous to provision a DB device only if it is faster than the primary device. How PGs getting calculated: Total PGs = (OSDs x 100) / Pool Size If i calculate my production cluster using this formula i get PGs = 100. PGs in my Production CEPH fluctuates between 50 and 120 which average of 90. More info here: docs.ceph.com/en/latest/rados/operations/placement-groups/ edit: forgot to mention, during night when server does almost nothing, PGs drops to 30 - 40
The documentation says „faster drive than…“ does that include a ramdrive? Or must it not be non-volatile? A ramdrive would be the absolute fastest and significantly increase log, db and metadata performance
Superb video, thanks. You make this look really easy, but I know from bitter experience, its not at all easy. I have exactly the same setup you have, but would like to add further USB SSD disk storage (1TB Drives) to two of my nodes. My guess is this can be setup as a separate ceph pool? I think there is little point adding it to the primary pool - replication 5 times seems overkill. If I cant, I may just try and do a separate zfs share for Proxmox backups One word of warning - I 'experimented' with pimox. I added one pimox node to the cluster, but ceph just didnt want to work properly. It automatically added my pooled storage to the new pimox node. I decided I didnt want that node, so tried to remove it. Its impossible. It ended up with a complete destruction of my entire cluster, and trashed my pooled data. I have backups luckily so suggest backing up too. One more thing. My nodes are mini pcs. Each has two 1GB NICs. I would somehow like to utilise both on each node - One for CEPH sync, the other for general management (assuming this will improve the bottlenecks) Do you have experience with this?
Thank you for a comment ! Couple thing i want to comment back if i may, PiMox - i don't think Arm version of Proxmox has support for CEPH. I could be wrong, but last time when i tried to get 3x RPi 4s to get in to cluster, everything worked so-so-ok until i tried to get CEPH working. When you have want to remove any node from proxmox and same time all its connection to CEPH, you need to do in very specific order otherwise all your cluster full fall apart as house of cards. I know that because i messed up quite a few times while trying to get one node off cluster list. I am making a list of steps you need to take to get this process done without (hopefully) any problems and get video done about it. My main proxmox home lab cluster is 3x Beelink Mini S12Pro with N100 CPU. Each has 512GB NVME, 2TB SSD, 32GB Ram. When you setting up CEPH, you can't configure CEPH to work with one node. It needs to be minimum of 3 nodes. If one Node had 3 drives, you can set all 3 drives as CEPH OSDs and make replication work that way but this brings another problem, what if that node goes offline, CEPH will go offline too and all this hard work to make CEPH work is pointless.
@@MRPtech Yep, I see the madness in my logic. I think I will take the other two 1TB Disks and build an OMV server somewhere else in the house! Like you, I learned the hard way. Don't mess with CEPH.. It works, and well, but if you look at it wrong, it bites you!
Great video and amazingly helpful. My question is, how would you team network cards? I mean, could you have 2x 10gb interfaces act as one interface? Could you also seperate management traffic to a seperate vlan to vm traffic or replication traffic? I'm thinking about having two interface dedicated to ceph, two interface dedicated to vm migrate, two interface dedicated to vm and two interface for management
Really nice video. Might be a dumb question to ask in a cluster video but what type of storage do you recommend for a single node? Right now I only have one mini pc, is it good enough to use local or maybe I can use zfs for only one node?
At the start i did too had only one mini PC. my setup back then was: mini PC with 1x NVME 512GB for Proxmox HOST and 1x 2TB SSD for keeping all VMs / LCSs. Backups where stored on same 2TB SSD and once backups where done sync to outside location. If i loose 2TB SSD - that is all my VM/LCX are gone + all backups are gone still i have option to restore all from outside location. For you i would recommend same setup. Depending on how many VMs you planning to run - increase backup frequency to run maybe twice a day, i had mine once a night due to amount of VMs i had going. Backup used to take 4 hours and running twice a day wasn't an option for me. If you mini PC allows to have 3 drives - 1 for proxmox, drive 2 and 3 are mirrored ZFS.
At the start i had it mounted as simple drive inside /mnt/2tb_drive, but later i changed it to ZFS single drive when i learned one or things about ZFS. Using ZFS even with single drive was great thing to have as i was able to do snapshots which by the way saved me from headache more then once.
Well, now I have an issue in HA. I had a node move to another host migrate, but the disc volume did not go with it. I shared the local ZFS drives as you should. Each node has the ZFS storage, but doesn’t have the data from the migrated node. Only the information from the VM that was created on that node. So, when I try to migrate it back to the original node, it says no link to original volume.
With ZFS as your VM storage, migrating VM or LXC won't migrate data. You need to do Replication to make sure that all the nodes has up to date information. Do you have a backup of that VM? If you do, Destroy VM and delete VM disk. Once all that is created you can restore from backup
@@MRPtech Hello and thank you so much for the reply!!! and no that’s the issue I do NOT have a back up for that VM (it’s on my list to do) and it’s still on the host it migrated to without the data drive the data Drive is still intact on the original host tho. I’ve been hesitant to do anything because I don’t want to lose it… The ZFS storage pools we’re already originally created when Proxmox was originally installed onto those nodes not sure if that has something to do with it or not. Because in your video, I noticed you created ZFS storage onto the three nodes, and I figured that was already done in my instance. Another interesting thing is I did try to migrate it back to the original host and not only the error of the missing volume but also because it had a USB attachment to the original so it says that it could not be migrated, but in fact, it’s there. And I wanted to mention that my plan is to create either NSF on My Synology and or Ceph storage. I do have an additional drive on each of those hosts. Plus, I want to install the proxmox back up server not sure the method of that yet tho.
I have a question Mr P. : what if I have a cluster of 3 nodes, all storage of these 3 nodes are of the same size and configured as a Ceph-storage cluster. In a later stage I want to add 3 additional nodes which will make the cluster 6 nodes. These 3 new nodes do not have the same amount of storage as the 3 original nodes : they have less storage, but the amount of storage is the same on all these 3 servers. Can I create a Ceph-cluster for the 3 new nodes ?
any chance you can do a video on how to remove Ceph. i tried NFS, but it kept complaining that i could not get to /mnt/pve/ even thought the process did create multiple drives on the NFS/TrueNas server
Love this tutorial, thank you. I am new to Proxmox having moved away from ESXi and I want to create a Proxmox cluster for VMs, Plex, VDI etc. At the moment I have 2 servers (one AMD, one Intel) that I want to use in the Cluster. Is it best to buy a 3rd node that is used for shared storage only and have the VMs etc run on the other 2 Nodes, so if one of the 2 nodes fails, the VMs can still spin up because they are attached to the shared storage on the 3rd node in the cluster? My plan is to have node 1 with 2 x 2TB nvmes to run VMs, node 2 with a single 4 TB nvme for VMs, docker/plex etc and ther 3rd node for all the storage (movies, music, ISOs, etc). I will then link the cluster using 10gb ethernet. Some advice would be appreciated, thank you.
For you to have HA (High Availability) where is one node goes offline, VM will automatically start on another node - you need 3 Nodes minimum. Nodes count should be odd number (3,5,7,...) (more info : pve.proxmox.com/wiki/Cluster_Manager ) With two nodes you can migrate VM from one node to another manually, Auto VM migration can't be done without Proxmox HA.
Cool video! How do I do that if I have already made a ZFS (RAID1) when installing Proxmox? Is that still possible, because I can't do the step as at minute 12:40 because logically no storage is displayed. Greetings from Germany , Tom
Hello, Do you have any info on changing the CEPH OSD Back address and Back Heartbeat Network Address? Seems I created the Network on 1 host but my others only have 1 network interface so I can't create OSD on those Hosts.
Hey there - just wondering if you or anyone else has done this while hosting a unify controller? Since when self-hosting the controller, a mongo database is involved, I'm wondering how well a migration from one node to another works out?
If node goes offline, Home Assistant VM gets migrated ... but ... i have 2 USB devices linked to HAOS VM. Zigbee and Z-Wave dongles. I been looking for work around for that, thinking about USB-over-IP setup where both dongles will be shared by 4th device and HAOS will pick usb devices over network.
Thanks my case. I have two Ethernet Zigbee coordinator. In this case , I don’t have any device physically connected to the VM or machine. Another question? It a mandatory to configure a network vlan only for the ceph? Could I use the same network net that I use for all?
With my proxmox cluster everything is linked via one connection as each node has 1 lan port because each node has only 1 single LAN port. If each node had 2 i would definitely configure 2nd port to work only for CEPH. Is it mandatory to split CEPH data traffic from everything else - i don't think so. If you pushing a lot of data in and out of CEPH - data transfer speed is a priority. For me, for home lab - would be nice to have best possible data transfer but it is not at the top of my list - i just want CEPH to work. To add to all of this: when my proxmox cluster doing backup of VMs and sends backup data to Synology NAS. Data transfer is going via that one single LAN port. If same time i am creating VM or LXC container and virtual disk is stored inside CEPH - i do notice that this process takes longer compared to the time when LAN port is not busy. CEPH > OSB list and check "Apply/Commit Latency (ms)" showing how long it takes for CEPH to sort data in/out traffic. If LAN port is not busy - latency usually sits around double digits, but when LAN port is doing something - that latency goes to 200 - 300ms.
I confirm taht CEPH affects my network, I must be mandatory create a Vlans otherwise more of my devices drop the network, bacuase my wifi master mesh is in the same network than CEPH and also in the same switch. ZFS is working as CEPH, or need less network capabilities?
Excellent video but I'm stuck on the "create ZFS" step. It appears that your pve-1 is a virtual machine with the ZFS target disk being another virtual disk. In my case, my Proxmox node is bare metal and the entire proxmox boot drive is three partitions: BIOS boot, EFI, and LVM. Is there an easy way to "carve out" a virtual disk from LVM to use for ZFS? I've done it on a VM on the node but am unsure if there is a way to do it on the node itself. My Linux knowledge is mostly just little pockets of dangerous knowledge in a sea of possibilities.
I've got an external NVMe on that node that is ext4. I created a mount point for it, a directory in DataCenter, and then created a 500GB virtual disk on it in a VM. That seems to work fine. It was mostly a proof of concept with Proxmox Backup Server so I could repurpose it easily enough as the ZFS disk for my Proxmox HA proof of concept if that's a better option
I was going to use ZFS since I'm simply running a home lab. Nightly sync for ZFS would be more than adequate. I don't care if I loose 1 day of data. However, since a home lab is for learning, I've decided to use Ceph.
Can a Ceph disk on a single node be actually a RAID ? So for example each nodes would have 2 2TB disk so totalling 12TB of physical disks giving 2TB of actualy CEPH but it would allow 1 disk to die inside a node? EDIT: okey I googled, it does not support RAID because it is already redundant.
If I shutdown one node in cluster of 3nodes, VMs and CT are not moved according to HA rule and all other VMs and CTs on other nodes are also down. I am using one OSD per node in ceph pool. How to fix?
@@MRPtech HEALTH_WARN mon pve32 is low on available space; 1/3 mons down, quorum pve31,pve32; Degraded data redundancy: 41800/125400 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized. Each node is ceph monitor as well as ceph manager.
Great in-depth Video. I got up and running no problem until I tried to test migrate to my 3rd node then I received an error: HEALTH_WARN: Degraded data redundancy: 39/3360 objects degraded (1.161%), 4 pgs degraded, 4 pgs undersized pg 2.3f is stuck undersized for 3h, current state active+undersized+degraded, last acting [0,1] pg 2.41 is stuck undersized for 3h, current state active+undersized+degraded, last acting [1,0] pg 2.5a is stuck undersized for 3h, current state active+undersized+degraded, last acting [0,1] pg 2.76 is stuck undersized for 3h, current state active+undersized+degraded, last acting [1,0]
Hi, To me these errors showing that there is an issue with connection between nodes or drives inside CEPH failing to sync. When similar thing happened to me i lost 2TB SSD. Drive still works but its performances is doomed. All 3 drives needs to be in sync, when migration happens, a lot of data being pushed between nodes and drives, while all this happening - proxmox relocating VM drive. if data connection between nodes is not speedy enough or drives can't read/write data fast enough - CEPH will struggle to keep up with all block changes. What it is showing now?
OSD drives recommended to be same size, from my testing and for home lab is not mandatory. You can have 2 OSB drives being 4TB and one 3TB is fine. but when you have 2 x 2TB drives and one just measly 512GB - this could lead to a disaster.
I have all 3 nodes set as Monitor and Master. and i have 2 WordPress websites hosted. If website runs on Node 1 and Cloudflare Tunnel Runs on let's say Node 2 or 3 or even 1, once Node 1 dies, all stuff gets migrated automatically and website up and running again.
@@MRPtech are you showing this configuration on details on one of your videos because I have lack of knowledge how it works on the networking part since when you open the port 443 or 80 is for a specific IP on a specific node this part is not clear
You have port forwarded done for the site? I have cloudflare tunnel pointing to my website. This way i don't need to port forward anything inside my router. Cloudflare Tunnel : th-cam.com/video/XyCjCmA_R2w/w-d-xo.html&pp=ygUabXJwIHRlY2ggY2xvdWRmbGFyZSB0dW5uZWw%3D Let's say your website local ip is 192.1.2.3:80/443 So you need to port forward 80 and 443 to point to 192.1.2.3 If website VM hosted inside node 1 and node 1 goes offline, VM will auto-migrate (if Proxmox HA setup propertly) to another node. VM Local IP won't change and your port forward rules will still apply.
10 Gbps is good to have but i think 2.5 is a good starting spot, for home lab user at least. Once i got everything set, even 1 Gbps was doing ok to move stuff around. Where it fails is when i try to do a lot of things at the same time (backup VM, Restore VM, Migrate VM) all at the same time.
The difference between and Ceph and NFS performance really needs to be considered in not only migration time but also normal read/write speeds when operating the container. Because the storage in ceph is local to the machine you should have MUCH more IO than you would going over the network to your NFS machine. NFS to me is not a good answer when talking about most use cases of HA.
All 3 nodes (OSDs) are link between using 1 Gbps which is same link between Cluster and NAS. If my intention in this video would be performance between storage options, you comment would be more then valid, my intention in this video was - Setting up HS Cluster and NFS does a job if you want to have HA cluster. As i mention in the video NFS and CEPH performance from my testing is almost the same, but NFS has "single point of failure" and that is why, and i quote my own video "CEPH is best option for you"
doesn't Ceph allow locally stored data to be accessed locally? admittedly some of the data may require a network link but you could probably configure it so that your primary compute node for a given service would be almost 100% assured of having local data (the benefit of which is only realized if assumption that local data doesn't need network IO).
@@MRPtech what I means is i want to have a failover storage just incase my first nfs storage restarted or somehow fail to boot for various reason. My pfsense is installed on truenas drive but i want to failover to nfs setup on raspberry pi just incase truenas fail to boot is that possible?
This is the most thorough Proxmox tutorial I have see so far. Please keep recording them !
This is by far the best Proxmox HA setup video I have seen. You clearly explained all HA options plus clustering.
I learned how HA works from this video, thanks!
One note - when you're migrating the VM on NFS or CEPH, you're only moving the contents of RAM from one machine to another. You're not testing the performance of NFS vs. CEPH.
Thank you for pointing this out.
Strange thing - from my testing every time CEPH takes a lead for transfer time between node compared to NFS.
Thank you very much for this. When I first heard about CEPH and ZFS I looked on TH-cam for tutorials, but I didn't find much that goes over a step by step like yours. Yours is the best I've see thus far that goes over Local, ZFS, NFS, CEPH in a Proxmox Home Server.
The best video that actually explained all of the different types of storage in proxmox and the positives and negatives of each. Great job.
Thank you so much for your video. You really covered the topic perfectly for an introduction. You started with foundation concepts and built upon those to show us how Clusters, HA, and the different storage types work. I got more value from your video in 45 minutes than I have from days of reading forum posts and watching the scattered topics others have produced.
I'm still looking for a very clear walk-through of the Proxmox networking options. What I've seen so far is too simple or too advanced. I need that ground-up approach like you've taken here.
Thanks again for your great work!
I cant consume enough of this Video. Thank you MRP. Brilliant. Just love it
Thanks for making this video available, it has a good pace which makes it easier to understand what can be a difficult subject. Please keep recording them as I'm sure your audience will only grow,
Fantastic video, followed it to set up my Proxmox and Ceph cluster! I should have watched it to the end first but couldn't help myself. My problem was needing additional drives to install Ceph, but I was able to do that fairly easily and resolved the lack of hardware with USB SSD's. Not perfect, as my 3 Proxmox servers were built with micro form-factor NUC style devices. 👍
I am glad all worked out for you. Just please note, USB storage does work but Read/Write will kill USB storage using faster compared to using a HDD/SSD storage.
Hi, I watched several videos on this topic, I agree with the previous speakers, you did a hard and very good job. Great respect, thanks from me. With your help, my homeassistant migrates between servers. From ZFS local storage, next for further experiments.
Awesome video, thanks a lot. I’ve been putting off redoing my 3 node cluster, because I didn’t have my head wrapped around Ceph yet, but I think I’m ready to try it out on a test cluster. Wish me luck. 😂
Good Luck.
At first i was the same ... scared to even touch that option.
Once i tried to setup couple times, turns out it is not that hard to setup, it is just the order you need to to things and once all setup just leave it. it does almost everything by it self.
I've Saved this video. I'm 100% going to be watching this when I setup CEPH this year!
Excellent video! I've been using Proxmox for a year with NFS, and a couple of days ago I started using Ceph. I watched several videos about Ceph, and yours was the clearest, even though you covered it quickly. New subscriber here!
Great video!
And VERY timely as well as I just bought three mini PCs to do PRECISELY this.
Just got Ceph up and running today as well.
Haven't started putting VMs nor conatiners on the cluster yet -- that'll be next.
Thank you!
This was the best explanation of Proxmox storage I've yet seen. Great work! Thank you, MRP.
My dude, awesome tutorial! Awesome pace, hands on, excellently explained. This has helped me so much, thank you!
Thank you for taking the time with this video. It was the most detailed video on Proxmox I have watched.
The first time in a log time that I've learnt something new. Very good video explaining what is a complex topic. Hat's off to you sir.
Another brilliantly explained video that gives a great overview of all the storage options for the Proxmox home dabbler.
My key takeaway's are:
1) that running VM's on a shared storage platform like a NAS has multiple issues like single point of failure and network traffic.
2) that CEPH is quite 'doable' with a bit of careful thought - and you have certainly laid out the process perfectly for us to follow.
My only concern with CEPH is a comment I recall from another channel (Jim?), that CEPH is very 'hard' on the drives and 'consumer' kit may suffer.
Only time will tell :)
I am happy that you enjoyed my video.
And yes. CEPH tanks hardware. I had SSD Ceph pool before and i lost one of the 2TB SSD drives. It started to drop data and sometimes fail to read or write. Now my CEPH pool contains 3x 1TB NVME drives and nodes are connected via 2.5Gbps. So far so good. I still keep an eye on my drives performance.
Great information, clean and clear. Keep it going. Thanks.
Very well produced tutorial on various storage options for Proxmox and presented in a straight forward style that is direct to the point and is very easy to absorb and understand. Well Done!
Excellent video that you will easily follow and explained very well.
I like the fact you went into great details and depth on something that is usually glossed over by other content creator. Particularly on the choice of options for storage meant to support HA. I used to work with Ceph, but... using SSDs increased wearout dramatically. I think I'll switch to ZFS over NFS thanks to your video.
Your effort producing this excellent video is very much appreciated. I'm sure you have set up CephFS by now as it allows for storing iso and container images available from each node, like NFS but without the single point of failure.
Terrific video . I'm about to create a cluster to do HA and all the links talk about shared storage. which is a single point of failure. Another link discussed ZFS Replication but that didn't interest me. Then I stumbled onto this link and you show how Ceph isn't as involved as I thought. Thank you for taking the time to create this comparison.
A very good video, you did.
Thank you very much.
I have learned a lot about Proxmox, especially about CEPH. 👍
You just taught us using MASTER mode and everything just clicked inside my skull 😂
Great video my friend! Really detailed and accurate. This one is pure gold! :))))
Amazing video, thank you very much, you managed to clearly explain a very complex subject, this is top-level content, pls continue
Very informative, thank you very much. Keep up the good work!!!
Mr P thanks, I have been devouring your tutorial vids. You have the best quality tutorials- please consider making a few proxmox cluster full setup for beginners, as there are so many ways to get lost. Regardless of what you produce next, thanks
Hi,
Thank you for your comment.
I have started to write up script for Proxmox Cluster setup "From 0 to Hero"
It is taking some time but one thing is for sure - it will happen.
I loved this video! Thanks a ton MRP!
Excellent video ! Very clear demo, much appreciated
The best video in this matter! Thank you!
love these explanations. easy to understand and follow. Thumbs up!
Thank you so much for your video, very well presented and explained
Thanks this is a very nice and well paced vide. Please keep them coming.
Thanks man this video was pretty much useful
Thank You for this video!
Nothing can better than this !!!
Greate video, thanks for all your effort!
Excellent explanations! thank you!
Thanks for high grade tutorial!
Excellent explanation! you’re a very good teacher. Thanks to you. I have so far 2 nodes up and working on third and and shared storage. Is it OK to ask questions here? I have one about adding local storage to a node, primarily the primary node. I wanted to add a NVME local drive to store and run the VM’s from and I’m not sure of the correct file system-to use. Currently I am running a SATA DOM which is 64g basically supermicro, solid-state drive and that holds the prox OS and the running VM’s, I also have installed two 500 GB Samsung solid-state drives which I have yet not configured not sure what to use for. Thank you very much!!!
I have 3 cheap N100 mini PCs. I added 256GB SATA SSDs to install Proxmox OS, and then I'm using the 500GB NVMEs that came installed with the mini PC to be the Ceph pool drives since they should be faster than SATA SSD. All important VMs & LXC containers will be installed in the Ceph pool with HA, and when I'm simply testing, I can use the local storage left over from the Proxmox install on the SATA SSD.
Excelent explanation! Congrats!
Really helpful! Thank you.
Super duper Good. Dont stop
Good job man! Signed up!
great job with the video, thank you
fantastic video!!!
Great. Thanks
Great video... I subscribed.
Great video! I am curious as to how you took the node offline to do your failover demonstration?
Great video!! At 13:26 you mention to not check the box of "add storage" while creating ZFS pool on node 2 & 3. Does that use the disk from node 1 to create the ZFS on nodes 2 & 3? Or you have similar size disks on each node?
The best video ever on proxmox. That being said there is one question every Proxmox user avoids. Is Proxmox a solution to use in Production or it is just for hobby?
You can 100% use it for production.
Difference between FREE repository and PAID is that PAID repository packages getting validated by Proxmox team before available for update. With PAID version if something goes wrong with your Proxmox host, Proxmox support team will help you out to get things sorted where FREE version you are on your own to find solution / fix.
So far i have helped to deploy 5+ Proxmox setups where they being used in "production" environment.
@@MRPtech thanks so much for the straight answer.
very very useful video thanks
Great video, thank you very much for this!
I have a question regarding HA Cluster and CEPH.
In the best-case scenario, if I want to directly connect the 3 nodes without switches in between, should I use 2 NICs for high availability (node to node) and another 2 NICs for node-to-node CEPH connections?
Would it require a total of 4 NICs per node just to interconnect each node separately for HA and CEPH, or can I use just 2 NICs and combine CEPH with HA?
I hope I have made myself clear on my question.
great video! - but - ... the only thing what would make it better (maybe in another video) is: how to create zvol (volumes on local zfs) and use them as ceph storage. That's what i am looking for.
Have you ever done such thing? I want faster response than external drives but got only one m.2 ssd slot and no other interfaces.
Cool! Thanks! Greetings from Russia
Nice presentation with thorough examples. Kudos on the upcoming future video, troubleshooting ceph !!!! One of the most usable troubleshoot cases would be removing a node for fixing (so entering back up the cluster) and one for total removal.
At 34:42 you could explain a little better about the cache drive in case of extra ssd in the system, cause you mentioned it and just rounded mouse between DB Disk and WAL Disk without explaining what they are and do.
At 37:57 you edited your video explaining why you used on your production system 32 (because ceph consists of 3x2Tb). That is not a reason. Is there a formula which you didnt mention? Even if this number has teh ability to autoscale and won t cause any issues, it would be good to know the equation upon which it is based.
Hi,
I don't use them and as i don't have enough understanding about them - i am not in a position to teach over.
Here is a paragraph from CEPH Documentation:
A write-ahead log (WAL) device (identified as block.wal in the data directory) can be used to separate out BlueStore’s internal journal or write-ahead log. Using a WAL device is advantageous only if the WAL device is faster than the primary device (for example, if the WAL device is an SSD and the primary device is an HDD).
A DB device (identified as block.db in the data directory) can be used to store BlueStore’s internal metadata. BlueStore (or more precisely, the embedded RocksDB) will put as much metadata as it can on the DB device in order to improve performance. If the DB device becomes full, metadata will spill back onto the primary device (where it would have been located in the absence of the DB device). Again, it is advantageous to provision a DB device only if it is faster than the primary device.
How PGs getting calculated:
Total PGs = (OSDs x 100) / Pool Size
If i calculate my production cluster using this formula i get PGs = 100. PGs in my Production CEPH fluctuates between 50 and 120 which average of 90.
More info here:
docs.ceph.com/en/latest/rados/operations/placement-groups/
edit: forgot to mention, during night when server does almost nothing, PGs drops to 30 - 40
@@MRPtech Thanks for the info. I think according to the formula your PGs would be 3 (OSDs) x 100 /2 (Tb) = 150. How the 100 came up as a result?
The documentation says „faster drive than…“ does that include a ramdrive? Or must it not be non-volatile? A ramdrive would be the absolute fastest and significantly increase log, db and metadata performance
Great video
Thanks!
Thank you
Superb video, thanks. You make this look really easy, but I know from bitter experience, its not at all easy. I have exactly the same setup you have, but would like to add further USB SSD disk storage (1TB Drives) to two of my nodes. My guess is this can be setup as a separate ceph pool? I think there is little point adding it to the primary pool - replication 5 times seems overkill. If I cant, I may just try and do a separate zfs share for Proxmox backups
One word of warning - I 'experimented' with pimox. I added one pimox node to the cluster, but ceph just didnt want to work properly. It automatically added my pooled storage to the new pimox node. I decided I didnt want that node, so tried to remove it. Its impossible. It ended up with a complete destruction of my entire cluster, and trashed my pooled data. I have backups luckily so suggest backing up too.
One more thing. My nodes are mini pcs. Each has two 1GB NICs. I would somehow like to utilise both on each node - One for CEPH sync, the other for general management (assuming this will improve the bottlenecks) Do you have experience with this?
Thank you for a comment !
Couple thing i want to comment back if i may,
PiMox - i don't think Arm version of Proxmox has support for CEPH. I could be wrong, but last time when i tried to get 3x RPi 4s to get in to cluster, everything worked so-so-ok until i tried to get CEPH working.
When you have want to remove any node from proxmox and same time all its connection to CEPH, you need to do in very specific order otherwise all your cluster full fall apart as house of cards. I know that because i messed up quite a few times while trying to get one node off cluster list. I am making a list of steps you need to take to get this process done without (hopefully) any problems and get video done about it.
My main proxmox home lab cluster is 3x Beelink Mini S12Pro with N100 CPU. Each has 512GB NVME, 2TB SSD, 32GB Ram. When you setting up CEPH, you can't configure CEPH to work with one node. It needs to be minimum of 3 nodes. If one Node had 3 drives, you can set all 3 drives as CEPH OSDs and make replication work that way but this brings another problem, what if that node goes offline, CEPH will go offline too and all this hard work to make CEPH work is pointless.
@@MRPtech Yep, I see the madness in my logic. I think I will take the other two 1TB Disks and build an OMV server somewhere else in the house!
Like you, I learned the hard way. Don't mess with CEPH.. It works, and well, but if you look at it wrong, it bites you!
very good video
very good tutorial thnx alot!
Great video and amazingly helpful.
My question is, how would you team network cards?
I mean, could you have 2x 10gb interfaces act as one interface?
Could you also seperate management traffic to a seperate vlan to vm traffic or replication traffic?
I'm thinking about having two interface dedicated to ceph, two interface dedicated to vm migrate, two interface dedicated to vm and two interface for management
thanks very useful
Really nice video. Might be a dumb question to ask in a cluster video but what type of storage do you recommend for a single node? Right now I only have one mini pc, is it good enough to use local or maybe I can use zfs for only one node?
At the start i did too had only one mini PC. my setup back then was:
mini PC with 1x NVME 512GB for Proxmox HOST and 1x 2TB SSD for keeping all VMs / LCSs. Backups where stored on same 2TB SSD and once backups where done sync to outside location. If i loose 2TB SSD - that is all my VM/LCX are gone + all backups are gone still i have option to restore all from outside location.
For you i would recommend same setup. Depending on how many VMs you planning to run - increase backup frequency to run maybe twice a day, i had mine once a night due to amount of VMs i had going. Backup used to take 4 hours and running twice a day wasn't an option for me. If you mini PC allows to have 3 drives - 1 for proxmox, drive 2 and 3 are mirrored ZFS.
@@MRPtech thanks, in your first setup that second drive (2TB SSD) was mounted as a ZFS?
At the start i had it mounted as simple drive inside /mnt/2tb_drive, but later i changed it to ZFS single drive when i learned one or things about ZFS. Using ZFS even with single drive was great thing to have as i was able to do snapshots which by the way saved me from headache more then once.
Hi, what configuration should I apply to enable the minimum number of available copies to be 1 for a cluster with 2 nodes and a qdevice? Cheers.
Well, now I have an issue in HA. I had a node move to another host migrate, but the disc volume did not go with it. I shared the local ZFS drives as you should. Each node has the ZFS storage, but doesn’t have the data from the migrated node. Only the information from the VM that was created on that node. So, when I try to migrate it back to the original node, it says no link to original volume.
With ZFS as your VM storage, migrating VM or LXC won't migrate data. You need to do Replication to make sure that all the nodes has up to date information. Do you have a backup of that VM?
If you do, Destroy VM and delete VM disk. Once all that is created you can restore from backup
@@MRPtech Hello and thank you so much for the reply!!! and no that’s the issue I do NOT have a back up for that VM (it’s on my list to do) and it’s still on the host it migrated to without the data drive the data Drive is still intact on the original host tho. I’ve been hesitant to do anything because I don’t want to lose it… The ZFS storage pools we’re already originally created when Proxmox was originally installed onto those nodes not sure if that has something to do with it or not. Because in your video, I noticed you created ZFS storage onto the three nodes, and I figured that was already done in my instance. Another interesting thing is I did try to migrate it back to the original host and not only the error of the missing volume but also because it had a USB attachment to the original so it says that it could not be migrated, but in fact, it’s there. And I wanted to mention that my plan is to create either NSF on My Synology and or Ceph storage. I do have an additional drive on each of those hosts. Plus, I want to install the proxmox back up server not sure the method of that yet tho.
I have a question Mr P. : what if I have a cluster of 3 nodes, all storage of these 3 nodes are of the same size and configured as a Ceph-storage cluster. In a later stage I want to add 3 additional nodes which will make the cluster 6 nodes. These 3 new nodes do not have the same amount of storage as the 3 original nodes : they have less storage, but the amount of storage is the same on all these 3 servers. Can I create a Ceph-cluster for the 3 new nodes ?
any chance you can do a video on how to remove Ceph.
i tried NFS, but it kept complaining that i could not get to /mnt/pve/
even thought the process did create multiple drives on the NFS/TrueNas server
Love this tutorial, thank you. I am new to Proxmox having moved away from ESXi and I want to create a Proxmox cluster for VMs, Plex, VDI etc. At the moment I have 2 servers (one AMD, one Intel) that I want to use in the Cluster. Is it best to buy a 3rd node that is used for shared storage only and have the VMs etc run on the other 2 Nodes, so if one of the 2 nodes fails, the VMs can still spin up because they are attached to the shared storage on the 3rd node in the cluster? My plan is to have node 1 with 2 x 2TB nvmes to run VMs, node 2 with a single 4 TB nvme for VMs, docker/plex etc and ther 3rd node for all the storage (movies, music, ISOs, etc). I will then link the cluster using 10gb ethernet. Some advice would be appreciated, thank you.
For you to have HA (High Availability) where is one node goes offline, VM will automatically start on another node - you need 3 Nodes minimum. Nodes count should be odd number (3,5,7,...) (more info : pve.proxmox.com/wiki/Cluster_Manager )
With two nodes you can migrate VM from one node to another manually, Auto VM migration can't be done without Proxmox HA.
hey, a mac mini ,that's a good idea, .. Do u have set-up vidoe on the mini
Cool video! How do I do that if I have already made a ZFS (RAID1) when installing Proxmox? Is that still possible, because I can't do the step as at minute 12:40 because logically no storage is displayed.
Greetings from Germany ,
Tom
What do you mean "already made Radi1 while installing"? You installed Proxmox OS inside Raid?
Hello, Do you have any info on changing the CEPH OSD Back address and Back Heartbeat Network Address? Seems I created the Network on 1 host but my others only have 1 network interface so I can't create OSD on those Hosts.
Hey there - just wondering if you or anyone else has done this while hosting a unify controller? Since when self-hosting the controller, a mongo database is involved, I'm wondering how well a migration from one node to another works out?
amazing. Has you ckecked with home assistant? Works in case Home Assistant node is off?
If node goes offline, Home Assistant VM gets migrated ... but ... i have 2 USB devices linked to HAOS VM. Zigbee and Z-Wave dongles.
I been looking for work around for that, thinking about USB-over-IP setup where both dongles will be shared by 4th device and HAOS will pick usb devices over network.
Thanks my case. I have two Ethernet Zigbee coordinator. In this case , I don’t have any device physically connected to the VM or machine. Another question? It a mandatory to configure a network vlan only for the ceph? Could I use the same network net that I use for all?
With my proxmox cluster everything is linked via one connection as each node has 1 lan port because each node has only 1 single LAN port. If each node had 2 i would definitely configure 2nd port to work only for CEPH.
Is it mandatory to split CEPH data traffic from everything else - i don't think so. If you pushing a lot of data in and out of CEPH - data transfer speed is a priority. For me, for home lab - would be nice to have best possible data transfer but it is not at the top of my list - i just want CEPH to work.
To add to all of this: when my proxmox cluster doing backup of VMs and sends backup data to Synology NAS. Data transfer is going via that one single LAN port. If same time i am creating VM or LXC container and virtual disk is stored inside CEPH - i do notice that this process takes longer compared to the time when LAN port is not busy.
CEPH > OSB list and check "Apply/Commit Latency (ms)" showing how long it takes for CEPH to sort data in/out traffic. If LAN port is not busy - latency usually sits around double digits, but when LAN port is doing something - that latency goes to 200 - 300ms.
Thanks a lot for your helpful information. Appreciated
I confirm taht CEPH affects my network, I must be mandatory create a Vlans otherwise more of my devices drop the network, bacuase my wifi master mesh is in the same network than CEPH and also in the same switch.
ZFS is working as CEPH, or need less network capabilities?
Excellent video but I'm stuck on the "create ZFS" step. It appears that your pve-1 is a virtual machine with the ZFS target disk being another virtual disk. In my case, my Proxmox node is bare metal and the entire proxmox boot drive is three partitions: BIOS boot, EFI, and LVM. Is there an easy way to "carve out" a virtual disk from LVM to use for ZFS? I've done it on a VM on the node but am unsure if there is a way to do it on the node itself.
My Linux knowledge is mostly just little pockets of dangerous knowledge in a sea of possibilities.
I've got an external NVMe on that node that is ext4. I created a mount point for it, a directory in DataCenter, and then created a 500GB virtual disk on it in a VM. That seems to work fine. It was mostly a proof of concept with Proxmox Backup Server so I could repurpose it easily enough as the ZFS disk for my Proxmox HA proof of concept if that's a better option
thanx for this .. but wich is the best ceph or zfs ??
I was going to use ZFS since I'm simply running a home lab. Nightly sync for ZFS would be more than adequate. I don't care if I loose 1 day of data. However, since a home lab is for learning, I've decided to use Ceph.
is posible to use CEPH in a SAN?
I use a my asus router with USB and nfs (nas) - to fast move betwen my vm/ct, because of thee need for zfs ram is so high.
Can a Ceph disk on a single node be actually a RAID ? So for example each nodes would have 2 2TB disk so totalling 12TB of physical disks giving 2TB of actualy CEPH but it would allow 1 disk to die inside a node? EDIT: okey I googled, it does not support RAID because it is already redundant.
Yes. CEPH is redundant storage. Where RAID storage located on one node, CEPH is spread across multiple nodes.
If I shutdown one node in cluster of 3nodes, VMs and CT are not moved according to HA rule and all other VMs and CTs on other nodes are also down. I am using one OSD per node in ceph pool. How to fix?
Once Node is OFF. What Proxmox HA page is showing? Quorum OK or not?
@@MRPtech HEALTH_WARN mon pve32 is low on available space; 1/3 mons down, quorum pve31,pve32; Degraded data redundancy: 41800/125400 objects degraded (33.333%), 33 pgs degraded, 33 pgs undersized. Each node is ceph monitor as well as ceph manager.
Great in-depth Video. I got up and running no problem until I tried to test migrate to my 3rd node then I received an error:
HEALTH_WARN: Degraded data redundancy: 39/3360 objects degraded (1.161%), 4 pgs degraded, 4 pgs undersized
pg 2.3f is stuck undersized for 3h, current state active+undersized+degraded, last acting [0,1]
pg 2.41 is stuck undersized for 3h, current state active+undersized+degraded, last acting [1,0]
pg 2.5a is stuck undersized for 3h, current state active+undersized+degraded, last acting [0,1]
pg 2.76 is stuck undersized for 3h, current state active+undersized+degraded, last acting [1,0]
Hi,
To me these errors showing that there is an issue with connection between nodes or drives inside CEPH failing to sync.
When similar thing happened to me i lost 2TB SSD. Drive still works but its performances is doomed. All 3 drives needs to be in sync, when migration happens, a lot of data being pushed between nodes and drives, while all this happening - proxmox relocating VM drive. if data connection between nodes is not speedy enough or drives can't read/write data fast enough - CEPH will struggle to keep up with all block changes.
What it is showing now?
@@MRPtech yep I guess your correct, york's OSD Drive are smaller that the other 2. This could be very well the problem. thx for your response.
OSD drives recommended to be same size, from my testing and for home lab is not mandatory. You can have 2 OSB drives being 4TB and one 3TB is fine. but when you have 2 x 2TB drives and one just measly 512GB - this could lead to a disaster.
How about Linstor instead of Ceph ?...
13:30 I get as far as this, but no device shows up.
If the hardware of pve 1 goes down and suppose there is a website running on that node. How can the connection switch since the pve1 is the master ?
I have all 3 nodes set as Monitor and Master. and i have 2 WordPress websites hosted.
If website runs on Node 1 and Cloudflare Tunnel Runs on let's say Node 2 or 3 or even 1, once Node 1 dies, all stuff gets migrated automatically and website up and running again.
@@MRPtech are you showing this configuration on details on one of your videos because I have lack of knowledge how it works on the networking part since when you open the port 443 or 80 is for a specific IP on a specific node this part is not clear
You have port forwarded done for the site?
I have cloudflare tunnel pointing to my website. This way i don't need to port forward anything inside my router.
Cloudflare Tunnel : th-cam.com/video/XyCjCmA_R2w/w-d-xo.html&pp=ygUabXJwIHRlY2ggY2xvdWRmbGFyZSB0dW5uZWw%3D
Let's say your website local ip is 192.1.2.3:80/443
So you need to port forward 80 and 443 to point to 192.1.2.3
If website VM hosted inside node 1 and node 1 goes offline, VM will auto-migrate (if Proxmox HA setup propertly) to another node. VM Local IP won't change and your port forward rules will still apply.
What kind of backend network is required to run this well.
At least 10 Gbps dedicated network.
10 Gbps is good to have but i think 2.5 is a good starting spot, for home lab user at least.
Once i got everything set, even 1 Gbps was doing ok to move stuff around. Where it fails is when i try to do a lot of things at the same time (backup VM, Restore VM, Migrate VM) all at the same time.
The difference between and Ceph and NFS performance really needs to be considered in not only migration time but also normal read/write speeds when operating the container. Because the storage in ceph is local to the machine you should have MUCH more IO than you would going over the network to your NFS machine. NFS to me is not a good answer when talking about most use cases of HA.
All 3 nodes (OSDs) are link between using 1 Gbps which is same link between Cluster and NAS.
If my intention in this video would be performance between storage options, you comment would be more then valid, my intention in this video was - Setting up HS Cluster and NFS does a job if you want to have HA cluster.
As i mention in the video NFS and CEPH performance from my testing is almost the same, but NFS has "single point of failure" and that is why, and i quote my own video "CEPH is best option for you"
doesn't Ceph allow locally stored data to be accessed locally? admittedly some of the data may require a network link but you could probably configure it so that your primary compute node for a given service would be almost 100% assured of having local data (the benefit of which is only realized if assumption that local data doesn't need network IO).
This is an Excelent video!
Is it possible to replicate on two nfs location?
Yes you can make VM to be relicated to more than one Node at the same time.
@@MRPtech what I means is i want to have a failover storage just incase my first nfs storage restarted or somehow fail to boot for various reason. My pfsense is installed on truenas drive but i want to failover to nfs setup on raspberry pi just incase truenas fail to boot is that possible?