Took us a long time to fine tune it to that extend that we're able to create it from scratch in one go.... Except we didn't do it for some time, then we almost always have to fine-tune again. Since composition-functions we started to more extensively use compositions in our stack and together with the usages it indeed seems to solve to problem very nicely when creating as well as deleting stuff.
I thought I was missing something when I had this experience, thanks for sharing. I would suffer any breaking changes in ArgoCD v3 if that were the cost to get eventual consistency!
I would say as long as you have no mutating admission controllers... eventual consistency works. Eventual consistency *would've* worked in the other scenarios as well if all mutating controllers could signal a re-admission when they come online. I guess you could hack this in from various ways... scanning pods for various annotations left (or not left) by the known admissions you have... but yeah.. it is a tricky and annoying subject when re-building a cluster from scratch! Often you have to repeat this process a hundred times to get all the ordering 100% correct, if you add it "later" after you've gradually built that stack over time ...
I experienced this already long time ago. I had to define sequence in my bootstrap process. I was always surprised that on the ArgoCD videos it was mentioned that all k8s resources should always reconcile till everything is running, waiting for their dependencies to become ready. Never happened on my cluster. It had a complex setup with ArgoCD, running 20+ controllers with inter-dependencies, even v-cluster and applications on top controlled by same ArgoCD instance. Managed secrets already imposed challenges (first CNI, then secret controller, then ArgoCD as it used secrets, then mesh, then ...). When bootstrapping I used a bash script (I know, there are better ways :-)) to control sequence, ended up a bit messy (as expected). But the fact that in bootstrapping sequence always mattered was frustrating and caused (is causing) massive additional complexity. If crossplane can manage that, that logic should also be used in a gitops tool. The woly grail for a solution which can manage bootstrapping and ongoing lifecycle management with one seamless integrative logic does not yet seem to be out there..great, some room for improvement!
This! I was attracted to the k8s world for its "declarative first" way, oly to discover that I need to split my resources and apply them in specific order, and that many of the tools don't care about it and tells you to download their own cli to do things imperatively...
Good point with disaster recovery. I did undergo the exercise of bootstrapping a fresh cluster after setting up a full FluxCD repository with all cluster components. Had to define some dependencies for CRDs and a bit of ordering as well. Eventually full reconciliation of a repo to a cluster should succeed from point 0, that's an essential basis for GitOps approach.
Funny coincidence. We just today shared some frustration about Argo CD and Argo WF in our team. We just recently updated from an old version to the current one, and it seems everything got worse. The application is instable and fails for some team members. A lot of IT issues. We were disappointed to find that literally None of the things we wished to improve in Argo were addressed over the long period of time. The UI is buggy as hell and a UX nightmare. UI elements that glitch out. Namespace drop downs which don't actually show the available namespaces but instead store everything you type in and provide that as options next time without any UI option to clear these. Everything is persisted across tabs. You can't use Argo in multiple tabs at all. And I can go on with issues we face. On top of all that, the newest version now completely crashes for some of our team members with Chromium based browser. Oh and on the topic of Argo CD sync, we had services failing to sync because of a property that apparently Argo added on the fly, which we did not specify, and it then detected as change. So it tried to sync something that wasn't there. I know, we shouldn't complain and rather contribute to fix these issues...
Great video! I fully agree. This is something that had me confused in the beginning when learning Flux, as I was expecting it to behave exactly like you are describing here.
@DevOpsToolkit I don't know if this has been checked, but it looks like ArgoCD supports eventual consintency by using a combination of .spec.syncPolicy.syncOptions with the Validate=false and retries. TH-cam doesn't allow me links so you can Google for "akuity application dependencies argocd"
You can, eventually get there with Argo CD or, at least, get close to it. That is my point though. It should not be that hard to get to one of kubernetes core prepositions. Eventual consistency should be the default and not a combination of endless Paramus and lua scripts which only partially solve the problem that should not be the problem in the first place.
@@DevOpsToolkit totally agree with you that the setup should be easy and not a nightmare,but to be fair with ArgoCD we should say that's possible at least. I wasn't aware of that, at least not from this video so I reported it for everyone's convenience!
Sounds like someone who had the same working week as me 😅 I have been using Argo Tools for 2 years now and love them. They do what they are supposed to do. But with complex projects, especially when entire infrastructures are built (ArgoCD + Crossplane / CRDs) it quickly becomes a pain. It would be nice if Argo CD would simply try again "later" if a CRD is missing ... I'm just too old to worry about pre/post hooks / wave sync 👴 and last but not least! once you have built your infra, always destroy and re-sync to see if the order is correct🤞
But ArgoCD is more similar to kubectl command behavior; if you have ready-to-go Kubernetes manifest files in the folder and start to apply them with kubectl one by one in the order of your file system show you, you will probably face some issues because of the dependencies among them. So, I would recommend all ArgoCD users to have helm templates for your deployments since helm templates render all manifest in the correct order to be deployed(here you will face some issues if your application helm template mixes with operators' CRDs, which helm doesn't understand the proper order of them). So, having plain k8s manifest files in a folder is not the best way to use Argo for deployments, I would say.
Namespace NetworkPolicy ResourceQuota LimitRange PodSecurityPolicy PodDisruptionBudget ServiceAccount Secret SecretList ConfigMap StorageClass PersistentVolume PersistentVolumeClaim CustomResourceDefinition ClusterRole ClusterRoleList ClusterRoleBinding ClusterRoleBindingList Role RoleList RoleBinding RoleBindingList Service DaemonSet Pod ReplicationController ReplicaSet Deployment HorizontalPodAutoscaler StatefulSet Job CronJob Ingress APIService helm 3 the order you always need to follow, put the manifest into a yaml file in that order and be happy. helm can help as well
Thanks for another great video! I definitively agree on eventual consistency for ArgoCD for resources creation, but there's also the other face of the medal: deletion order. There, sync waves and other mechanisms like the ones you described are needed. After all, that's also why Crossplane introduced the (still in alpha) Usages concept which was for a long time requested to address issues like these.
Oh yeah. Deletion ordering is a problem on its own. I only mentioned it in this video since it deserves a whole video on its own. This one is focused only on creation and updates.
I would like to also add to this, that Argo Workflows are also broken. The most complex part when it comes to work with Argo Workflows is creation of Argo Events trigger, so that Argo Workflow will be automatically triggered when new changes are being pushed into the app repository.
Good video. Also to add, there seems to be a variety of scenarios where a sync won't bounce a pod/deployment (e.g. configmap or secret changes content but not name). This means once ArgoCD syncs you still have to go in and manually scale the deploy down and up to restart it. Please let me know if Im wrong on this because it's very annoying. Essentially the selling point of GitOps / ArgoCD is that I push to Git and it takes care of the rest. In reality this is nowhere near the case.
That's more of a limitation of kubernetes than Argo CD. Argo CD will try to update resources you specify. You'll need a resource that will delete pods (e.g. a job).
I kinda agree, it's difficult to express dependencies between applications since it could have circular dependencies as you pointed. So that means that it should be another mechanism like Terraform to generate a dependency graph to apply the changes in order or a recursive operation that it constantly retries in a preemptive way, so it does not throw a dead lock executing the task.
I've noticed this issue and I agree 100% percent. Atm it is just a small inconvenience for me, however if they fix this in a near future that's going to be great improvement.
Great video! I had a recent issue with syncing where the application gets stuck at presync running even after the job completes successfully and I have to terminate the running sync operation and start another manually! Still can't figure out why this is happening
Seems like a no brainer... Great preso. I like the terminal overlays, they feel less distracting than the audio bubbles. my 2¢ - comb the mustache before the camera comes on. Would seem that ArgoCD could easily retry its resource creation at least as many times as there were resources being instantiated, so that a worst case of complete (persistent) failure would then notify, or alternatively watch for forward progress in its work queue, and if none occurred then throw the failure notification. I've seen this sequencing problem arise with chef as well... making idempotent configuration is hard.
Great video, thanks! I have never felt like specifying sync waves and other flags to be an issue. To me, it was quite understandable and I took it in as a fact. Now after your video about the matter, things do seem off by this behavior. My question now, is... how do we help?
I guess that changing the default behavior is not a good option for existing users so maybe a single flag that disables validations and retries for a specific number of times would be a solution.
Hi Viktor, where Argo CD fails is exactly where Flux CD (V2) wins. It repeats the apply attempts repeatedly until it works. It also reports missing requirements. As pretty as Argo CD may be, I personally find Flux CD (V2) to be the better solution. Does Flux CD need a nice "standard" UI, the same way as Argo CD? Yes, 100% yes! But is Crossplane the only solution doing eventual consistency?
If you use Timoni for your modules, I believe you can externalise a lot of this stuff to the Timoni tool. I really like that Timoni modules can have their own ordering and health checks built-in. Not sure if it does the "dumb retry" though 🤔.
Timoni Can do that, but that's not what I want. I do not want to specify order of anything. Otherwise, Argo CD does that just fine. What i want is to say "here are the manifest, work it out."
I absolutely agree. It was an awkward learning curve for me with Argo CD to discover that it did not in fact work that way. I tripped up on the crossplane provider config example several times 🙃
great video Viktor, thank you! I'd be really interested in seeing the video about the complicated / useful composition you mention in the b/w part... Are you going to make it, or is it already out?
I presented it at KubeCon last week. I'm planning to do a video but it probably won't be live soon. Lately I published more videos about Crossplane than usual and I would like to avoid this channel being focused on Crossplane so I'll take a break for it and, in a month or two, publish something about Crossplane again. The next one will the about Databases-as-a-Service and, after that, the "big" one that combines DBaaS with applications.
Was your session at Kubecon recorded? In case, I could wait for the CNCF to release it, usually in about 3 months time. I fully understand your choice of not overexposing Crossplane on your channel, considering the broad audience it has and also your role at Upbound. In any case, I will wait patiently, and in the mean time enjoy the rest of the great content, mentor Viktor! Thanks again!
@acola777 not sure whether that one was recorded. I did it at day zero colo event. I know that my other talks that i did at KubeCon proper were recorded though.
Great Video!! I totally agree with what you have said. My silly suggestion: Wouldn't it be amazing if you (Crossplane/Upbound) and ArgoCD join forces to fix this issue as you seem to already have a working solution and not being a direct competitor? It would help improve things quickly for the whole k8s community - do you agree?
Crossplane and Argo CD are working together but on different areas from each other. Those two are complementary rather than overlapping. P.S. Argo CD is a great tool and I strongly recommend using it. My complaints about argo CD are not meant to discourage people from using it but rather to show that there is room for improvement. I complain about Crossplane as well, for the same reason.
It looks like eventual consistency is easier to work with, better, and even simpler and I agree with your complaints. Now I wonder why Argo does it differently and if that has some advantages, too. EC might possibly make it easier to create disasters by mistake (e.g. when renaming/moving stuff causes unwanted deletion of resources, etc), but that is just a feeling.
One disadvantage of eventual consistency is that it is harder to figure out when something failed. If a resource cannot be created, the system should try again later. However, it might never be able to create it. Without eventual consistency that is easy to seduce. It failed the first time. With eventual consistency you might conclude that it will not work by saying "if it failed 5 times it will likely never work".
I get your point of view, but I think the main argument here would not be that Argo is doing it wrong, but kubernetes, since that's the orchestrator, it is the one to do the reconciliation, Argo never claimed it would do that, it just syncs git to the cluster API. One of the main issues from Argo's point of view is that when an error occurs, it doesn't know what the error exactly is. It could be a dependency problem, but it could just as well be a user error, and it doesn't really have a decent way to make the distinction. I mean, at what point should it otherwise flag that an application is failing to deploy? The error condition becomes very undefined, and determining and then alerting that an application deployment fails becomes very tricky.
I'd also argue that Kubernetes operates on a promised state it considers acceptable and achievable and rejects anything which corrupts the combined state of its store. For example, a resource with an invalid apiVersion or kind. I would say it is up to a GitOps operator to perform the imperative task of throwing manifests at the kubernetes API until it sticks and tell you otherwise. If the kubernetes API accepts it, you can expect it to deliver its promise, all things going well. If the Kubernetes API accepted invalid configuration then you've just corrupted your cluster.
@@DryBones111 A crashloop backoff on pod level is also something that you ideally want to avoid and shouldn't become normal, and is already an error that's hard or impossible to monitor and alert on with certainty, people mostly take the safest route here and just alert on them if it restarts more than once in the last X minutes. Making more of these types of fuzzy errors is not what we need, especially at scale. Because expecting Argo to handle this, would mean that having a situation where having errors being thrown around is the default state, and people will start ignoring them. I also see so many edge-cases and problems with this that you'd have to configure its behavior per-application, that you might just as well be explicit about the order. Introspection and debugging is only going to become harder when there will eventually be a real problem. And then we're not even talking about adding to Argo's already considerable load on the kubeapi and other unwanted side-effects of such "magic". In my book, explicit always beats implicit anyway. It's not because it would be "nicer" in theory that in practice this would be a good idea, especially in the current framework we have where that's not what the Kubernetes architecture was designed for. And with a proper Argocd bootstrap architecture, these things are in my book a non-issue. And yes I've helped bootstrap large clusters from scratch with Argo without a hitch (hello OpenShift 3.11 to 4.x migration)
In the first example... isn't that what the --atomic flag to helm is there to help with? :) I actually wish --atomic could be made the global default for Helm...
Crossplane righr nos has a deletion order feature that is still alpha (not yet enabled by default). We believe that creation order is not important but deletions sometimes are.
When crossplane keeps trying to create the resources is trying to indefinlty withe the provider API to create the resources? Also inside of the ClusterClaim and DBClaim did you defined the dependencies between resources? Like resource B depends of resource A?
Really nice and helpful. I have question in argocd: What is the difference between LASY SYNC and SYNC STATUS?. I always get confused with this. Is there a video I can follow on this?
@MegaAVINASH24 sync status contains the commit hash that is currently being processed while the last sync is the last that was synced. Those can differ because it might be syncing the last commit while the previous one is already synced.
Oh yeah. That's a bigger one. In this and the previous videos I'm trying to raise the awareness not n cessary of problems but rather of areas we should invest in.
Wouldn't this be solved by using a tool like Terraform? In Terraform you have your project that includes the eks, the network and the main eks addons (ingress nginx, alb, external secrets, sonarqube...). Argo comes later, synchronizing the development team's applications in this environment.
I understand that Crossplane can help a lot, working together with Terraform, to define things in AWS and take advantage of the flow of how things work with Kubernetes. For example: I have a project with a simple AWS architecture: an API in Kubernetes, an RDS and a Front in CloudFront... Crossplane would be used to deploy these resources + IAM Permissions. Terraform would previously be used to, for example, upload the network: VPC, Subnets and so on.
@@DevOpsToolkit 100% this! Do you have any similar videos on Terraform? Imo it's also massively broken in many ways compared to what is marketed (e.g. plan should essentially be a dry run but more often than not fails on apply). I hate Terraform tbh but it seems to be the best we have right now.
@brk5 I got involved with Crossplane because of the issue with terraform. Since then, i do not talk about terraforrm since my rule is to not talk about competition. That would make me biased.
@@DevOpsToolkit But isn't the idea of defining order something that must exist at some level? For the crossplane to work I would need to have a network structure and a kubernetes created, this implicitly creates an idea of "order". Now, it really sucks having to define dependencies in Terraform and sync-wave in ArgoCD... But how would I install an Ingress Nginx on Kubernetes using Crossplane? Could you do a tutorial on how to upload an EKS, configure the crossplane and additionally create a setup with the crossplane to upload the addons? Thanks! I love your content.
You are doing ArgoCD in a wrong way man. Argo wasn't build to organize all stuff within kubernetes, we cannot expect that ArgoCD would read multiple YAML manifests from a repository and organise the whole installation process in the right order. I understand your frustration, but it can't be done this way. I particularly prefer installing AddOns like external secrets, keda, nginx-controller, external-dns, kyverno, etc using a simple pipeline with helm upgrade install commands. Once my k8s cluster has all the infrastructure requirements I start using ArgoCD to deploy business applications. That's the main goal. Create template helm for your business applications and let the ArgoCD do the job it was made for. ArgoCD does the rendering of your helm charts converting them in YAML manifests and installing the resources needed to deploy your business applications. At the same time keeps your business applications repositories in sync with k8s namespaces as it should be in a GitOps approach.
I do understand that argo CD cannot figure out the order. My point is that it should not figure out the order but apply what it can and try again later with what it could not. There should not be the need to know the order. Kubernetes is perfectly capable of doing what it can and trying again later and all I'm asking is for argo CD to do the same. If a pod mounts a secret and a volume, kubernetes does not expect us to do those first and apply pod only once those are running but will fail the pod or put it into a pending state and make it running later.
Don't take me wrong, but I think you are overwrapping a lot of infra stuff inside your yamls and blaming the automation tool. In my practice, a separate team setups kubernetes (including addons, operators) [with terraform, cdk, whatever], and application teams just use whatever tools we give them, like argocd, etc. Your problem seems to be the initial eventual consistence for something complicated, in my opinion terraform is the tool for that. The application devs shouldn't be concerned with those things. Just my two cents.
It does not matter what i used. You'll get the same result with anything more complex that creates CRDs, applies some CRs, etc. You would have trouble using Argo CD to install Argo CD with a few projects and apps in a different cluster.
So, I think you're way off base here. FIrst, you're using eventual consistency wrong, it means eventually all reads from a sharded datastore will have the same value, Kubernetes doesn't do that. I think you're mistaking the times when developers tell k8s to keep something running for eventual consistency. Pods that are started by jobs and cronjobs aren't restarted lke the ones that are created by Deployments and Statefulsets, and you can manually create pods that behave either way, so what you see as eventual consistency is the intent of the devs deploying not k8s. Second, when you got to the end you pointed out that control-plane never knows when to stop, which means the devs never get told the deploy fails, and that will delay them actually taking action to resolve the issue, driving up the mean time to detect. Finally, your belief that neither the deployment tool nor the developers need to know what order the resources need to be deployed means that no-one understands how the system works and everyone just hopes it will work, and a strategy of hope is a poor strategy. Someone should understand how things work, so that failures can be troubleshot efficiently.
What i meant by eventual consistency is that etcd will eventually get all the resources which, in that case, come from git. I was talking about the state of etcd. So if a job is defined in git it needs to eventually get to etcd, not pods. Pods are what jobs manage, not me and not in git. Also, i do agree that someone should understand how things should work. That is not the same as saying that we need to specify the exact order of everything. Are you specifying that secrets should be created before pods that mount them?
You're right, but, please stop whinging. I'm half way through the video (and I'll watch it all), but your whine! OMG, stop it. Just explain things, drop the emotion, please.
The issue is that defining too much in a single Application. In this example the operator would be one application and the resources that "use" the operator should be a separate Application (because it is, an operator should not be consuming its own CRD). Also when defining the ArgoCD Application, simply set `spec.syncPolicy.retry.limit: -1` and it turns ArgoCD into an eventually consistent controller!
I feel that splitting into many applications things that are a single logical group is a pattern we adopted as a workaround rather than something we should do. Also, having to specify something like `spec.syncPolicy.retry.limit: -1` feels unnecessary and, also, does not fix the problem since it would just be failing indefinitely without other flags like those to ignore that CRDs do not exist, waves, etc.
@@DevOpsToolkit I also think that sometimes, if not most, the "infra" team will create the `crossplane` install, XRD/ XR and soon on, which should not block for long, then your `dev` team will create claims, maybe in another repo/project... splitting the two should solve most of the issues here. You can also tweak the `Application` sync behaviour to retry more often for a longer period of time. There may also be some benefits of using the (new) cluster-side apply, or in this case not using it. Not sure how it could affect resources validations. I'm actually doing that with an Argo App (from an Appsset) that inflates the Crossplane Helm chart using kustomize and also includes ore resources for the Providers... simply adding a sync-wave=2 does the work... but again I'm not creating the resources there. Did you opened an Issue on the ArgoCD project to add the `always reconcile` new feature ?
@piratemakers I'm not concerned about claims and I do agree that should be separate from compositions. My complaints were about other things. For example, I cannot put crossplane itself and providers under the same app even though they do belong together. P.S. when i say that i can't i mean that i can't without a lot of tweaking.
@@DevOpsToolkit I think splitting resources into separate applications is pretty foundational to any sort of organisation structure, file systems have directories, Kubernetes has Namespaces, ArgoCD has Applications (though I hate the name because a lot of the time I don't use them for an 'application'). I totally agree that infinite retry with backoff should be the default, as that with an Application for an operator separate from an Application with a CRD fixes the "failing indefinitely" pattern you see. You wouldn't realistically deploy an application container with its backend database container as a single pod...
Are you struggling with Argo CD synchronization order?
Took us a long time to fine tune it to that extend that we're able to create it from scratch in one go.... Except we didn't do it for some time, then we almost always have to fine-tune again. Since composition-functions we started to more extensively use compositions in our stack and together with the usages it indeed seems to solve to problem very nicely when creating as well as deleting stuff.
Yes, this is exactly what I am struggling with right now! It keeps saying stuff is out of sync, and once I manually retry, it works :)
I thought I was missing something when I had this experience, thanks for sharing. I would suffer any breaking changes in ArgoCD v3 if that were the cost to get eventual consistency!
I would say as long as you have no mutating admission controllers... eventual consistency works. Eventual consistency *would've* worked in the other scenarios as well if all mutating controllers could signal a re-admission when they come online. I guess you could hack this in from various ways... scanning pods for various annotations left (or not left) by the known admissions you have... but yeah.. it is a tricky and annoying subject when re-building a cluster from scratch! Often you have to repeat this process a hundred times to get all the ordering 100% correct, if you add it "later" after you've gradually built that stack over time ...
I experienced this already long time ago. I had to define sequence in my bootstrap process. I was always surprised that on the ArgoCD videos it was mentioned that all k8s resources should always reconcile till everything is running, waiting for their dependencies to become ready. Never happened on my cluster. It had a complex setup with ArgoCD, running 20+ controllers with inter-dependencies, even v-cluster and applications on top controlled by same ArgoCD instance. Managed secrets already imposed challenges (first CNI, then secret controller, then ArgoCD as it used secrets, then mesh, then ...). When bootstrapping I used a bash script (I know, there are better ways :-)) to control sequence, ended up a bit messy (as expected). But the fact that in bootstrapping sequence always mattered was frustrating and caused (is causing) massive additional complexity. If crossplane can manage that, that logic should also be used in a gitops tool. The woly grail for a solution which can manage bootstrapping and ongoing lifecycle management with one seamless integrative logic does not yet seem to be out there..great, some room for improvement!
I love this. We specify resources declaratively, but still do imperative steps to help convergence.
This! I was attracted to the k8s world for its "declarative first" way, oly to discover that I need to split my resources and apply them in specific order, and that many of the tools don't care about it and tells you to download their own cli to do things imperatively...
I totally agree... ArgoCD without adding ServerApply and a couple of Retries is much harder to work with. But with those, it works like wonders.
Good point with disaster recovery. I did undergo the exercise of bootstrapping a fresh cluster after setting up a full FluxCD repository with all cluster components. Had to define some dependencies for CRDs and a bit of ordering as well. Eventually full reconciliation of a repo to a cluster should succeed from point 0, that's an essential basis for GitOps approach.
Funny coincidence. We just today shared some frustration about Argo CD and Argo WF in our team. We just recently updated from an old version to the current one, and it seems everything got worse. The application is instable and fails for some team members. A lot of IT issues. We were disappointed to find that literally None of the things we wished to improve in Argo were addressed over the long period of time. The UI is buggy as hell and a UX nightmare. UI elements that glitch out. Namespace drop downs which don't actually show the available namespaces but instead store everything you type in and provide that as options next time without any UI option to clear these. Everything is persisted across tabs. You can't use Argo in multiple tabs at all. And I can go on with issues we face. On top of all that, the newest version now completely crashes for some of our team members with Chromium based browser. Oh and on the topic of Argo CD sync, we had services failing to sync because of a property that apparently Argo added on the fly, which we did not specify, and it then detected as change. So it tried to sync something that wasn't there. I know, we shouldn't complain and rather contribute to fix these issues...
Great video! I fully agree. This is something that had me confused in the beginning when learning Flux, as I was expecting it to behave exactly like you are describing here.
@DevOpsToolkit I don't know if this has been checked, but it looks like ArgoCD supports eventual consintency by using a combination of .spec.syncPolicy.syncOptions with the Validate=false and retries. TH-cam doesn't allow me links so you can Google for "akuity application dependencies argocd"
You can, eventually get there with Argo CD or, at least, get close to it. That is my point though. It should not be that hard to get to one of kubernetes core prepositions. Eventual consistency should be the default and not a combination of endless Paramus and lua scripts which only partially solve the problem that should not be the problem in the first place.
@@DevOpsToolkit totally agree with you that the setup should be easy and not a nightmare,but to be fair with ArgoCD we should say that's possible at least. I wasn't aware of that, at least not from this video so I reported it for everyone's convenience!
Sounds like someone who had the same working week as me 😅 I have been using Argo Tools for 2 years now and love them. They do what they are supposed to do. But with complex projects, especially when entire infrastructures are built (ArgoCD + Crossplane / CRDs) it quickly becomes a pain. It would be nice if Argo CD would simply try again "later" if a CRD is missing ... I'm just too old to worry about pre/post hooks / wave sync 👴
and last but not least! once you have built your infra, always destroy and re-sync to see if the order is correct🤞
But ArgoCD is more similar to kubectl command behavior; if you have ready-to-go Kubernetes manifest files in the folder and start to apply them with kubectl one by one in the order of your file system show you, you will probably face some issues because of the dependencies among them. So, I would recommend all ArgoCD users to have helm templates for your deployments since helm templates render all manifest in the correct order to be deployed(here you will face some issues if your application helm template mixes with operators' CRDs, which helm doesn't understand the proper order of them). So, having plain k8s manifest files in a folder is not the best way to use Argo for deployments, I would say.
Namespace
NetworkPolicy
ResourceQuota
LimitRange
PodSecurityPolicy
PodDisruptionBudget
ServiceAccount
Secret
SecretList
ConfigMap
StorageClass
PersistentVolume
PersistentVolumeClaim
CustomResourceDefinition
ClusterRole
ClusterRoleList
ClusterRoleBinding
ClusterRoleBindingList
Role
RoleList
RoleBinding
RoleBindingList
Service
DaemonSet
Pod
ReplicationController
ReplicaSet
Deployment
HorizontalPodAutoscaler
StatefulSet
Job
CronJob
Ingress
APIService
helm 3
the order you always need to follow, put the manifest into a yaml file in that order and be happy. helm can help as well
Thanks for another great video! I definitively agree on eventual consistency for ArgoCD for resources creation, but there's also the other face of the medal: deletion order.
There, sync waves and other mechanisms like the ones you described are needed.
After all, that's also why Crossplane introduced the (still in alpha) Usages concept which was for a long time requested to address issues like these.
Oh yeah. Deletion ordering is a problem on its own.
I only mentioned it in this video since it deserves a whole video on its own. This one is focused only on creation and updates.
I would like to also add to this, that Argo Workflows are also broken. The most complex part when it comes to work with Argo Workflows is creation of Argo Events trigger, so that Argo Workflow will be automatically triggered when new changes are being pushed into the app repository.
Good video. Also to add, there seems to be a variety of scenarios where a sync won't bounce a pod/deployment (e.g. configmap or secret changes content but not name). This means once ArgoCD syncs you still have to go in and manually scale the deploy down and up to restart it. Please let me know if Im wrong on this because it's very annoying. Essentially the selling point of GitOps / ArgoCD is that I push to Git and it takes care of the rest. In reality this is nowhere near the case.
That's more of a limitation of kubernetes than Argo CD. Argo CD will try to update resources you specify. You'll need a resource that will delete pods (e.g. a job).
I kinda agree, it's difficult to express dependencies between applications since it could have circular dependencies as you pointed. So that means that it should be another mechanism like Terraform to generate a dependency graph to apply the changes in order or a recursive operation that it constantly retries in a preemptive way, so it does not throw a dead lock executing the task.
I think it's even simpler than that. Try and try again. That the principal logic or eventual consistency.
I've noticed this issue and I agree 100% percent.
Atm it is just a small inconvenience for me, however if they fix this in a near future that's going to be great improvement.
Great video! I had a recent issue with syncing where the application gets stuck at presync running even after the job completes successfully and I have to terminate the running sync operation and start another manually! Still can't figure out why this is happening
Is there an issue/discussion open in the Argo CD project that addresses eventual consistency?
Seems like a no brainer... Great preso. I like the terminal overlays, they feel less distracting than the audio bubbles. my 2¢ - comb the mustache before the camera comes on. Would seem that ArgoCD could easily retry its resource creation at least as many times as there were resources being instantiated, so that a worst case of complete (persistent) failure would then notify, or alternatively watch for forward progress in its work queue, and if none occurred then throw the failure notification. I've seen this sequencing problem arise with chef as well... making idempotent configuration is hard.
Great video, thanks!
I have never felt like specifying sync waves and other flags to be an issue. To me, it was quite understandable and I took it in as a fact. Now after your video about the matter, things do seem off by this behavior. My question now, is... how do we help?
I guess that changing the default behavior is not a good option for existing users so maybe a single flag that disables validations and retries for a specific number of times would be a solution.
Hi Viktor, where Argo CD fails is exactly where Flux CD (V2) wins. It repeats the apply attempts repeatedly until it works. It also reports missing requirements. As pretty as Argo CD may be, I personally find Flux CD (V2) to be the better solution. Does Flux CD need a nice "standard" UI, the same way as Argo CD? Yes, 100% yes! But is Crossplane the only solution doing eventual consistency?
Crossplane is certainly not the only solution that does eventual consistency. Most Kubernetes controllers are doing it.
If you use Timoni for your modules, I believe you can externalise a lot of this stuff to the Timoni tool. I really like that Timoni modules can have their own ordering and health checks built-in. Not sure if it does the "dumb retry" though 🤔.
Timoni Can do that, but that's not what I want. I do not want to specify order of anything. Otherwise, Argo CD does that just fine. What i want is to say "here are the manifest, work it out."
I absolutely agree. It was an awkward learning curve for me with Argo CD to discover that it did not in fact work that way. I tripped up on the crossplane provider config example several times 🙃
great video Viktor, thank you! I'd be really interested in seeing the video about the complicated / useful composition you mention in the b/w part... Are you going to make it, or is it already out?
I presented it at KubeCon last week. I'm planning to do a video but it probably won't be live soon. Lately I published more videos about Crossplane than usual and I would like to avoid this channel being focused on Crossplane so I'll take a break for it and, in a month or two, publish something about Crossplane again. The next one will the about Databases-as-a-Service and, after that, the "big" one that combines DBaaS with applications.
Was your session at Kubecon recorded? In case, I could wait for the CNCF to release it, usually in about 3 months time.
I fully understand your choice of not overexposing Crossplane on your channel, considering the broad audience it has and also your role at Upbound. In any case, I will wait patiently, and in the mean time enjoy the rest of the great content, mentor Viktor!
Thanks again!
@acola777 not sure whether that one was recorded. I did it at day zero colo event. I know that my other talks that i did at KubeCon proper were recorded though.
Great Video!! I totally agree with what you have said. My silly suggestion: Wouldn't it be amazing if you (Crossplane/Upbound) and ArgoCD join forces to fix this issue as you seem to already have a working solution and not being a direct competitor? It would help improve things quickly for the whole k8s community - do you agree?
Crossplane and Argo CD are working together but on different areas from each other. Those two are complementary rather than overlapping.
P.S. Argo CD is a great tool and I strongly recommend using it. My complaints about argo CD are not meant to discourage people from using it but rather to show that there is room for improvement. I complain about Crossplane as well, for the same reason.
It looks like eventual consistency is easier to work with, better, and even simpler and I agree with your complaints. Now I wonder why Argo does it differently and if that has some advantages, too.
EC might possibly make it easier to create disasters by mistake (e.g. when renaming/moving stuff causes unwanted deletion of resources, etc), but that is just a feeling.
One disadvantage of eventual consistency is that it is harder to figure out when something failed. If a resource cannot be created, the system should try again later. However, it might never be able to create it. Without eventual consistency that is easy to seduce. It failed the first time. With eventual consistency you might conclude that it will not work by saying "if it failed 5 times it will likely never work".
I get your point of view, but I think the main argument here would not be that Argo is doing it wrong, but kubernetes, since that's the orchestrator, it is the one to do the reconciliation, Argo never claimed it would do that, it just syncs git to the cluster API.
One of the main issues from Argo's point of view is that when an error occurs, it doesn't know what the error exactly is. It could be a dependency problem, but it could just as well be a user error, and it doesn't really have a decent way to make the distinction. I mean, at what point should it otherwise flag that an application is failing to deploy? The error condition becomes very undefined, and determining and then alerting that an application deployment fails becomes very tricky.
Use a retry backoff with a retry limit, just like kubernetes does with pods.
I'd also argue that Kubernetes operates on a promised state it considers acceptable and achievable and rejects anything which corrupts the combined state of its store. For example, a resource with an invalid apiVersion or kind.
I would say it is up to a GitOps operator to perform the imperative task of throwing manifests at the kubernetes API until it sticks and tell you otherwise. If the kubernetes API accepts it, you can expect it to deliver its promise, all things going well. If the Kubernetes API accepted invalid configuration then you've just corrupted your cluster.
@DryBones111 the thing is that a CRD might not exist right now but it might come into existence a moment later.
@@DryBones111 A crashloop backoff on pod level is also something that you ideally want to avoid and shouldn't become normal, and is already an error that's hard or impossible to monitor and alert on with certainty, people mostly take the safest route here and just alert on them if it restarts more than once in the last X minutes. Making more of these types of fuzzy errors is not what we need, especially at scale.
Because expecting Argo to handle this, would mean that having a situation where having errors being thrown around is the default state, and people will start ignoring them. I also see so many edge-cases and problems with this that you'd have to configure its behavior per-application, that you might just as well be explicit about the order. Introspection and debugging is only going to become harder when there will eventually be a real problem. And then we're not even talking about adding to Argo's already considerable load on the kubeapi and other unwanted side-effects of such "magic". In my book, explicit always beats implicit anyway.
It's not because it would be "nicer" in theory that in practice this would be a good idea, especially in the current framework we have where that's not what the Kubernetes architecture was designed for. And with a proper Argocd bootstrap architecture, these things are in my book a non-issue. And yes I've helped bootstrap large clusters from scratch with Argo without a hitch (hello OpenShift 3.11 to 4.x migration)
In the first example... isn't that what the --atomic flag to helm is there to help with? :) I actually wish --atomic could be made the global default for Helm...
What will happen when deleting the CRDs before the CRs ? At least with ArgoCD sync/waves you can handle this behaviour. What Crossplane will do ?
Crossplane righr nos has a deletion order feature that is still alpha (not yet enabled by default). We believe that creation order is not important but deletions sometimes are.
When crossplane keeps trying to create the resources is trying to indefinlty withe the provider API to create the resources? Also inside of the ClusterClaim and DBClaim did you defined the dependencies between resources? Like resource B depends of resource A?
I did not define any dependencies. I did define that some resources need data from other (e.g. DB server needs a password stored in a secret).
Really nice and helpful.
I have question in argocd: What is the difference between LASY SYNC and SYNC STATUS?. I always get confused with this. Is there a video I can follow on this?
Last sync is when it was synced last and sync is the status of that last sync.
@@DevOpsToolkit but sometimes commit hash is different. thats why I am confused.
@MegaAVINASH24 sync status contains the commit hash that is currently being processed while the last sync is the last that was synced. Those can differ because it might be syncing the last commit while the previous one is already synced.
Perfection has not been reached... but luckily progress is being made.
Of course. I love Argo CD and this video is more of a suggestion than anything else.
@@DevOpsToolkit also seems to me your previous (and similar style) video is a bigger problem than this one.
Oh yeah. That's a bigger one. In this and the previous videos I'm trying to raise the awareness not n cessary of problems but rather of areas we should invest in.
+1
I am lazy ; I don’t want to set sync waves.
Is the fix going to be an enterprise feature from Argo!!! 😅
I don't think it's that but rather the default behavior that persists.
I've not used Flux. Does it handle this problem set any better?
Flux has a different way to define dependencies. Still, it requieres that we define them as well.
Great video!!
As always 💯
Last week it was kubernetes events is broken. Now argocd is broken..
I really waiting for dependencies. Other is ok
ArgoCD deployment order? I still remember App of Apps of Apps of Apps of .... 😂
Great video..
Wouldn't this be solved by using a tool like Terraform? In Terraform you have your project that includes the eks, the network and the main eks addons (ingress nginx, alb, external secrets, sonarqube...). Argo comes later, synchronizing the development team's applications in this environment.
I understand that Crossplane can help a lot, working together with Terraform, to define things in AWS and take advantage of the flow of how things work with Kubernetes. For example: I have a project with a simple AWS architecture: an API in Kubernetes, an RDS and a Front in CloudFront... Crossplane would be used to deploy these resources + IAM Permissions. Terraform would previously be used to, for example, upload the network: VPC, Subnets and so on.
The problem is similar in terraform. You still need to define the order, just in a different way.
@@DevOpsToolkit 100% this! Do you have any similar videos on Terraform? Imo it's also massively broken in many ways compared to what is marketed (e.g. plan should essentially be a dry run but more often than not fails on apply). I hate Terraform tbh but it seems to be the best we have right now.
@brk5 I got involved with Crossplane because of the issue with terraform. Since then, i do not talk about terraforrm since my rule is to not talk about competition. That would make me biased.
@@DevOpsToolkit But isn't the idea of defining order something that must exist at some level? For the crossplane to work I would need to have a network structure and a kubernetes created, this implicitly creates an idea of "order".
Now, it really sucks having to define dependencies in Terraform and sync-wave in ArgoCD... But how would I install an Ingress Nginx on Kubernetes using Crossplane? Could you do a tutorial on how to upload an EKS, configure the crossplane and additionally create a setup with the crossplane to upload the addons?
Thanks! I love your content.
great vid !
You are doing ArgoCD in a wrong way man. Argo wasn't build to organize all stuff within kubernetes, we cannot expect that ArgoCD would read multiple YAML manifests from a repository and organise the whole installation process in the right order. I understand your frustration, but it can't be done this way. I particularly prefer installing AddOns like external secrets, keda, nginx-controller, external-dns, kyverno, etc using a simple pipeline with helm upgrade install commands. Once my k8s cluster has all the infrastructure requirements I start using ArgoCD to deploy business applications. That's the main goal. Create template helm for your business applications and let the ArgoCD do the job it was made for. ArgoCD does the rendering of your helm charts converting them in YAML manifests and installing the resources needed to deploy your business applications. At the same time keeps your business applications repositories in sync with k8s namespaces as it should be in a GitOps approach.
I do understand that argo CD cannot figure out the order. My point is that it should not figure out the order but apply what it can and try again later with what it could not. There should not be the need to know the order. Kubernetes is perfectly capable of doing what it can and trying again later and all I'm asking is for argo CD to do the same. If a pod mounts a secret and a volume, kubernetes does not expect us to do those first and apply pod only once those are running but will fail the pod or put it into a pending state and make it running later.
What's the problem, just create 53 sync-waves and forget about it 😝
Don't take me wrong, but I think you are overwrapping a lot of infra stuff inside your yamls and blaming the automation tool. In my practice, a separate team setups kubernetes (including addons, operators) [with terraform, cdk, whatever], and application teams just use whatever tools we give them, like argocd, etc. Your problem seems to be the initial eventual consistence for something complicated, in my opinion terraform is the tool for that. The application devs shouldn't be concerned with those things. Just my two cents.
It does not matter what i used. You'll get the same result with anything more complex that creates CRDs, applies some CRs, etc. You would have trouble using Argo CD to install Argo CD with a few projects and apps in a different cluster.
So, I think you're way off base here. FIrst, you're using eventual consistency wrong, it means eventually all reads from a sharded datastore will have the same value, Kubernetes doesn't do that. I think you're mistaking the times when developers tell k8s to keep something running for eventual consistency. Pods that are started by jobs and cronjobs aren't restarted lke the ones that are created by Deployments and Statefulsets, and you can manually create pods that behave either way, so what you see as eventual consistency is the intent of the devs deploying not k8s. Second, when you got to the end you pointed out that control-plane never knows when to stop, which means the devs never get told the deploy fails, and that will delay them actually taking action to resolve the issue, driving up the mean time to detect. Finally, your belief that neither the deployment tool nor the developers need to know what order the resources need to be deployed means that no-one understands how the system works and everyone just hopes it will work, and a strategy of hope is a poor strategy. Someone should understand how things work, so that failures can be troubleshot efficiently.
What i meant by eventual consistency is that etcd will eventually get all the resources which, in that case, come from git. I was talking about the state of etcd. So if a job is defined in git it needs to eventually get to etcd, not pods. Pods are what jobs manage, not me and not in git.
Also, i do agree that someone should understand how things should work. That is not the same as saying that we need to specify the exact order of everything. Are you specifying that secrets should be created before pods that mount them?
You're right, but, please stop whinging. I'm half way through the video (and I'll watch it all), but your whine! OMG, stop it. Just explain things, drop the emotion, please.
hi
The issue is that defining too much in a single Application. In this example the operator would be one application and the resources that "use" the operator should be a separate Application (because it is, an operator should not be consuming its own CRD). Also when defining the ArgoCD Application, simply set `spec.syncPolicy.retry.limit: -1` and it turns ArgoCD into an eventually consistent controller!
I feel that splitting into many applications things that are a single logical group is a pattern we adopted as a workaround rather than something we should do. Also, having to specify something like `spec.syncPolicy.retry.limit: -1` feels unnecessary and, also, does not fix the problem since it would just be failing indefinitely without other flags like those to ignore that CRDs do not exist, waves, etc.
@@DevOpsToolkit I also think that sometimes, if not most, the "infra" team will create the `crossplane` install, XRD/ XR and soon on, which should not block for long, then your `dev` team will create claims, maybe in another repo/project... splitting the two should solve most of the issues here.
You can also tweak the `Application` sync behaviour to retry more often for a longer period of time.
There may also be some benefits of using the (new) cluster-side apply, or in this case not using it. Not sure how it could affect resources validations.
I'm actually doing that with an Argo App (from an Appsset) that inflates the Crossplane Helm chart using kustomize and also includes ore resources for the Providers... simply adding a sync-wave=2 does the work... but again I'm not creating the resources there.
Did you opened an Issue on the ArgoCD project to add the `always reconcile` new feature ?
@piratemakers I'm not concerned about claims and I do agree that should be separate from compositions. My complaints were about other things. For example, I cannot put crossplane itself and providers under the same app even though they do belong together.
P.S. when i say that i can't i mean that i can't without a lot of tweaking.
@@DevOpsToolkit I think splitting resources into separate applications is pretty foundational to any sort of organisation structure, file systems have directories, Kubernetes has Namespaces, ArgoCD has Applications (though I hate the name because a lot of the time I don't use them for an 'application'). I totally agree that infinite retry with backoff should be the default, as that with an Application for an operator separate from an Application with a CRD fixes the "failing indefinitely" pattern you see. You wouldn't realistically deploy an application container with its backend database container as a single pod...