This is actually a very complex and really important subject which even deserves more digging in. In my worklpace, we see lots of very different workload types and hardly ever proper base code based sigterm protection/handling. This fact creates massive challenges for the necessary platform update/upgrade speeds. Also I understand k8s anyway sends the sigkill after the graceshutdownperiod is passed, independent if the workload is still processing or not. This would mean that this time period always needs to be set properly in combination with max shutdown handling processing time of the application (or am I mistaken?). It would be great to have more examples of "best practices" how different workload types/code bases use sigterm handling code to truly apply graceful shutdown. Spot on with the subject though!
Thank you for another great video! I totally agree on the fact that apps should handle OS signals. As a Java developer shutdown hooks and interrupted Exceptions were part of my routine. In Kubernetes though the magic vanishes when you cope with pod termination lifecycle . The Pod removal from Endpoint objects and SIGTERM sending occur in parallel, so you can have a race condition where the Pod is terminated before the Endpoint object is. That means the Pod would still get traffic even if the Pod is no more! That is why you may need preStop hooks to handle graceful shutdown properly
When you said application stops accepting traffic, i guess you meant that app should also be coded to fail readiness probes when termination signal is received.
Those are different things. One of the first things Kubernetes does when attempting to kill a Pod is to stop sending traffic to it. From there on, it waits for the response to the signal it sent to that same Pod and, once it gets it or it times out it deletes the Pod. So, it's not the application code that stops receiving traffic but Kubernetes that stops sending traffic to the app inside the container which is inside the Pod that is about to be deleted. There's nothing the app should do to stop receiving traffic. On the other hand, more often than not, it should respond to the SIGTERM/SIGINT signal sent to it once it is finished processing traffic that is already being processed by it. As for the readiness signal... There's not much the app should do. That Pod where it's running is already scheduled to be deleted so Kubernetes does not care much about the readiness probes. It only waits for the response to the signal.
That's up to the logic you put into your code. What matters is that processes will be shutdown almost immediately without the code processing signals so you should always have them.
Thank you! I was looking forward to this video :)
Thanks a ton.
This is actually a very complex and really important subject which even deserves more digging in. In my worklpace, we see lots of very different workload types and hardly ever proper base code based sigterm protection/handling. This fact creates massive challenges for the necessary platform update/upgrade speeds. Also I understand k8s anyway sends the sigkill after the graceshutdownperiod is passed, independent if the workload is still processing or not. This would mean that this time period always needs to be set properly in combination with max shutdown handling processing time of the application (or am I mistaken?). It would be great to have more examples of "best practices" how different workload types/code bases use sigterm handling code to truly apply graceful shutdown. Spot on with the subject though!
Thank you for another great video! I totally agree on the fact that apps should handle OS signals. As a Java developer shutdown hooks and interrupted Exceptions were part of my routine. In Kubernetes though the magic vanishes when you cope with pod termination lifecycle . The Pod removal from Endpoint objects and SIGTERM sending occur in parallel, so you can have a race condition where the Pod is terminated before the Endpoint object is. That means the Pod would still get traffic even if the Pod is no more! That is why you may need preStop hooks to handle graceful shutdown properly
That's true. In this case, I was more focused on shutdown of only Pods due to upgrades (where the rest stay) since that is much more common scenario.
really enjoying those thriller scripts around the topic .. 😂
Thank you !
When you said application stops accepting traffic, i guess you meant that app should also be coded to fail readiness probes when termination signal is received.
Those are different things. One of the first things Kubernetes does when attempting to kill a Pod is to stop sending traffic to it. From there on, it waits for the response to the signal it sent to that same Pod and, once it gets it or it times out it deletes the Pod. So, it's not the application code that stops receiving traffic but Kubernetes that stops sending traffic to the app inside the container which is inside the Pod that is about to be deleted. There's nothing the app should do to stop receiving traffic. On the other hand, more often than not, it should respond to the SIGTERM/SIGINT signal sent to it once it is finished processing traffic that is already being processed by it.
As for the readiness signal... There's not much the app should do. That Pod where it's running is already scheduled to be deleted so Kubernetes does not care much about the readiness probes. It only waits for the response to the signal.
Is it possible to override the SIGTERM type/value using the container environment variables?
I don't think so. It's Lijux standard not dit ctly related but adopted by containers.
what if the requests or user session related to the request is long, coupling in multiple sessions on multiple pods , how would this be handled.
That's up to the logic you put into your code. What matters is that processes will be shutdown almost immediately without the code processing signals so you should always have them.
Its confusing when you say Nix and not *nix. I assumed you meant "unix like" systems
My bad. I wanted to say Linux or Unix and thought that nix would cover it.
Googled nix and i didn’t understand before seeing this comment 😂. Thanks for the video and thanks @robertkozak for your comment