Hi , what happens in case the re-initiated task t2 in both the worker nodes will complete with same time period . And which task will be consider either t2 in worker node1 or t2 in worker node2 .
What if I dont set the configs for speculative execution in my file before I begin my spark application? Once I encounter a long running task, can I simply open the config file and set the configs related to speculative execution or do I need to kill the application and then set the configs?
Changing the config file in between the active job will not enforce the changes to the job configs. u have two options, either kill the job and retrigger or else add a inline config command to the long running job with if else loop as below, if job_time>threshold: #kill current run and run with below config spark.conf.set("speculative exec","true")
@@AzarudeenShahul Right, but I think that 25% or 30% etc is what we configure to decide when to trigger Speculation Execution. So if 25% is still pending & taking time then what makes us consider that if we start it on another node, it will be completed soon. This is my doubt.
There might be many reason for the slowdown of task in one node; Few are, 1. hardware degradation 2.software mis-configuration 3.slow data transfer 4.traffic issue 5.Data availability It might be difficult to detect the cause for slowness since the task still completes successfully. Spark does not try to diagnose and fix the slow running task, instead it it detects the slow running and runs backup task for them. Hope this answers your question
@@AzarudeenShahul Thanks for the response but still I feel that void. Means we start a task on any node first when there is the capacity for that task. So in speculative execution we are engaging two nodes for the same task without being sure about the cause nor we cancel the task on the first node. And this way possibility is high that task will be completed by previous node first but still that task was started on new node also. And I think in your example also the task was completed by the first node & the task on new node was cancelled then. So doesn't it seem like wrong understanding of the infrastructure or wrong system configuration? Means I am trying to fill the gap between reality & expectations and my understanding.
Its not about the capacity of node, its the slowness as i mentioned above. More you read the application log, more u understand. It comes through experience. And in my example, its not the real project, i simulated a slow running task using sleep, which obviously makes node 1 to complete first.
Spark itself will take care of this. But, before enabling this in config, we should be aware of resources. If we have a less resource available in our cluster, then it might degrade the performance ..
U are better, Really explaining based on scenarios..
Nice demo and very well explained. Please continue uploading such nice videos on spark
Thanks for your support :)
speculative execution is at job level or stage level ?
Can you upload a video on spark optimization techniques for interview perspective. This has been asked multiple times. Thank you
Hi , what happens in case the re-initiated task t2 in both the worker nodes will complete with same time period . And which task will be consider either t2 in worker node1 or t2 in worker node2 .
Good Explanation.. please put videos on handling duplicates ,Null values using spark
Is data skew issue can handle by speculative execution?
What if I dont set the configs for speculative execution in my file before I begin my spark application? Once I encounter a long running task, can I simply open the config file and set the configs related to speculative execution or do I need to kill the application and then set the configs?
Changing the config file in between the active job will not enforce the changes to the job configs. u have two options, either kill the job and retrigger or else add a inline config command to the long running job with if else loop as below,
if job_time>threshold:
#kill current run and run with below config
spark.conf.set("speculative exec","true")
Hi bro, can you plz do videos on clusters and types, worker nodes, working nature of cluster in databricks
Good one, but I just wonder if one task is taking time on one node then how can another node complete the same task earlier?
Thanks for your support :)
It is not about one long running task, it may go till 25% of the overall pending task.
@@AzarudeenShahul Right, but I think that 25% or 30% etc is what we configure to decide when to trigger Speculation Execution. So if 25% is still pending & taking time then what makes us consider that if we start it on another node, it will be completed soon. This is my doubt.
There might be many reason for the slowdown of task in one node; Few are,
1. hardware degradation
2.software mis-configuration
3.slow data transfer
4.traffic issue
5.Data availability
It might be difficult to detect the cause for slowness since the task still completes successfully. Spark does not try to diagnose and fix the slow running task, instead it it detects the slow running and runs backup task for them.
Hope this answers your question
@@AzarudeenShahul Thanks for the response but still I feel that void. Means we start a task on any node first when there is the capacity for that task. So in speculative execution we are engaging two nodes for the same task without being sure about the cause nor we cancel the task on the first node. And this way possibility is high that task will be completed by previous node first but still that task was started on new node also. And I think in your example also the task was completed by the first node & the task on new node was cancelled then. So doesn't it seem like wrong understanding of the infrastructure or wrong system configuration? Means I am trying to fill the gap between reality & expectations and my understanding.
Its not about the capacity of node, its the slowness as i mentioned above. More you read the application log, more u understand. It comes through experience.
And in my example, its not the real project, i simulated a slow running task using sleep, which obviously makes node 1 to complete first.
Great videos
Very well articulated and explained. Can we conclude Speculative execution as one of the spark optimization techniques.
Spark itself will take care of this. But, before enabling this in config, we should be aware of resources. If we have a less resource available in our cluster, then it might degrade the performance ..
@@AzarudeenShahul Thanks for the reply. Can you upload a video on spark optimization techniques for interview perspective.
Sure, we can plan for series on optimization..
Thanks man