Apache Spark Executor Tuning | Executor Cores & Memory

Afaque Ahmad

มุมมอง 9 240

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 5 ก.ย. 2024

ความคิดเห็น • 101

@dudechany หลายเดือนก่อน ⁺⁶
Every-time I come here before attending an interview , I try to give this video a like , but end up realising that I already did it earlier. Best video on this topic on whole internet.
@afaqueahmad7117 23 วันที่ผ่านมา ⁺¹
This means a lot to me @dudechany, I really thank you for the generous and kind appreciation :)
@bijjigirisupraja8021 หลายเดือนก่อน ⁺⁵
Bro do the videos regularly on spark it will be very helpful. Thank you
@deepikas7462 หลายเดือนก่อน ⁺²
All the concepts are clearly explained. Please do more videos.
@afaqueahmad7117 23 วันที่ผ่านมา ⁺¹
Appreciate the kind words @deepikas7462, more coming soon :)
@BabaiChakraborty-ss8pt 4 หลายเดือนก่อน ⁺¹
Man your tutorials are the best. I have been following you for Spark turning related videos. Thanks
@afaqueahmad7117 4 หลายเดือนก่อน ⁺¹
Thank you @BabaiChakraborty-ss8pt, really appreciate it, means a lot to me :)
@SandeepPatel-wt7ye 2 หลายเดือนก่อน
This is awesome stuff..The executor Tuning concept is explained at a very granular level.
@afaqueahmad7117 หลายเดือนก่อน ⁺¹
Appreciate it @SandeepPatel-wt7ye, thank you!
@harshshah8884 หลายเดือนก่อน
@@afaqueahmad7117- qq .. let’s say i have limited RAM available like 50 GB and wants to process 1TB data , no additional capacity can be add into cluster , how should we apply based on your video- for optimal # executor / memory per executor.. / core per executor
@mohitupadhayay1439 หลายเดือนก่อน
Really waiting to see if you can add some real world use cases to your videos to strengthen our understanding. It will be appreciated a lot man!
@yatinchadha1803 3 หลายเดือนก่อน
Thanks Afaque for this great tutorial. This will really help while working on Spark Optimization. It would be of great help if you can tell how do you deal with this type of questions: -
spark cluster size -- 200 cores and 100 gb RAM
data to be processed --100 gb
give the calculation of spark for driver memory, driver cores, executor memory, overhead memory, number of executors
@afaqueahmad7117 3 หลายเดือนก่อน
Hey @yatinchadha1803, thanks for the kind words, really appreciate it. Regarding the question - after watching the video, it should be a cakewalk :)
@yatinchadha1803 3 หลายเดือนก่อน ⁺¹
@@afaqueahmad7117 can you please guide on how to calculate the driver memory and driver cores?
@leilaturgarayeva105 4 หลายเดือนก่อน
Thank you for the useful content! IRL an analyst / engineer would have access to a huge cluster which is shared between many people / teams. It would be very interesting to watch a video where you calculate the amount of resources that should be requested based on the task at hand (particular dataset, task and output). And again - thanks for helping to understand these somewhat hard to grasp concepts :-)
@AshishStudyDE 3 หลายเดือนก่อน
Great work, going good. I hope you cover 2 more topic of driver oom and executor oom. Why it happens and how we can tackle it.
@adtempgupta 4 หลายเดือนก่อน ⁺¹
Thankyou so much for wonderful content. please start PySpark session
@remedyiq8034 4 หลายเดือนก่อน ⁺²
At 35:10 @afaqueahmad7117 I want to add one point. You said that executions happen in execution memory, that is 60 % percent, and 40 percent is user memory. So . 60 Percent of 20GB -> is 12 GB memory. Out of which 50 percent is for execution and 50 percent for storage. Let's assume 50 percent is given to execution(static allocation). Out of 12 GB, only 6 GB is for execution. As we have 5 cores per executor. Therefore 6/5 === approximately 1.2 per portion of memory per core. The maximum partition size that can be accommodated is 1.2 GB of partition. My thought process is correct ????
@iamkiri_ 4 หลายเดือนก่อน
Looks Like this is a valid question bro!
@afaqueahmad7117 4 หลายเดือนก่อน ⁺¹
Hi @remedyiq8034, this is a very valid point and thanks for highlighting this. You're absolutely right about ~1.2GB memory per core. My mind was referring to execution memory but I really appreciate your attention to the breakdown of the `--executor-memory` into its various components, which I should have explained more clearly before doing the memory per core calculation. I'll look into adding an info card to make this clear in the video. Thanks again for your sharp observation!
@remedyiq8034 4 หลายเดือนก่อน
@@afaqueahmad7117 Thanks > I learned a lot from you. Watched all your videos. Keep doing great work for the community . Better than paid coursed of Udemy !!
@shaifalipal9415 19 วันที่ผ่านมา
out of 20 gb, for execution = .6 x 20 = 12 gb, for storage = .5 x 20 = 10 gb
execution per core = 12 /5 = 2.4 GB
Am i missing something ?
@strophariacaerulea 4 วันที่ผ่านมา
@@shaifalipal9415spark.memory.storage.fraction .5 is out of the spark.memory.fraction, so 20gb * .6 * .5 = 6gb for execution (static) -> 6gb / 5 executors = 1.2gb/ex.
@vishalpathak8266 15 วันที่ผ่านมา
Thank you for this video !!
@Amarjeet-fb3lk 3 หลายเดือนก่อน
Thanks for this videos.
I have been watching your videos from quite a while.
You explain things in a very easy and simple manner.
But,
I thinks in real time we would be processing a very large amount of data,
So , It will be great if you can make a video ön processing large amounts of data with all the optimisation techniques we can use.
Thanks in advance.
@afaqueahmad7117 3 หลายเดือนก่อน
Hey @Amarjeet-fb3lk, Thank you so much for the kind words; they truly mean a lot! I'm delighted to hear that you find the explanations easy and simple to understand. While production/large-scale projects are in the future plans, I would like to emphasize that the fundamental concepts and optimization techniques remain the same. My goal is to help you build a rock solid understanding of these concepts so you can confidently apply them in any scenario.
@tumbler8324 24 วันที่ผ่านมา
Perfect explanation & perfect examples throughout the playlist, Bhai mere Change data capture aur Slowly changing dimension jo bhi apply hote hain project me uska bhi khel samza de.
@afaqueahmad7117 23 วันที่ผ่านมา ⁺¹
Thanks for the kind words bhai @tumbler8324. Sab ayega bhai kuch waqt mein, pipeline mein hai :)
@mayapareek2844 3 หลายเดือนก่อน
Wow !! Great Content !! I am preparing for interviews and found this super helpful. Thanks a Ton !!
@afaqueahmad7117 3 หลายเดือนก่อน
Glad you're finding it helpful @mayapareek2844, heartfelt thanks :)
@purnimasharma9734 3 หลายเดือนก่อน
Hell Afaque, your tutorials are excellent and I learnt so much about optimization techniques. I am wondering if you can add some real world use cases to your videos to strengthen our understanding. It will be appreciated a lot.
@iamexplorer6052 4 หลายเดือนก่อน
Thanks for this currently working on job optimization it is very useful to me
@afaqueahmad7117 4 หลายเดือนก่อน
Thank you, really appreciate it :)
@sankarshkadambari2742 4 หลายเดือนก่อน
Amazing is the word you never dissapoint us . very greatful and indebted to you for this excellent content you are creating. God bless you !
@afaqueahmad7117 4 หลายเดือนก่อน
Thank you @sankarshkadambari2742, really appreciate it, means a lot to me :)
@ashutoshpatkar4891 หลายเดือนก่อน ⁺¹
Hey man. learnt a lot from the video. please help me out on this doubt
for example 2, total executors = 44/4 = 11 you have said. But shouldn't we think machine by machine, here each machine can have, 15/4 === 3 executors if 4 core for each, giving total 3*3 nodes = 9. in your workout, it seems like there will be an executor which will use some cores from one node and some from other. Am I wrong in my thought process somewhere?
@yashwantdhole7645 2 หลายเดือนก่อน
Hi Afaque, it is was a really nice video. Never got such detailed understanding anywhere. Do you also provide 1:1 session? If yes, I am highly interested.
@afaqueahmad7117 2 หลายเดือนก่อน
Hey @yashwantdhole7645, appreciate the kind words, means a lot. At this moment, I do not take 1:1 sessions, but if you have any questions feel free to shoot an email or comment here in this thread :)
@asokanramasamy2087 4 หลายเดือนก่อน
Great! If possible Pls make video of Spark streaming as well!
@remedyiq8034 4 หลายเดือนก่อน ⁺²
Hi, Can you please make a video on Spark UI or Databricks Spark UI understanding. There are a lot of tabs there; its tough to understand it.
@afaqueahmad7117 4 หลายเดือนก่อน ⁺³
Hey @remedyiq8034, could you share which tabs are troubling you? The most important ones, I've discussed, sharing links below:
1. Storage tab: Caching video (th-cam.com/video/FujwRYkBwM4/w-d-xo.html)
2. SQL tab: Master Reading Spark Query Plans video (th-cam.com/video/KnUXztKueMU/w-d-xo.html)
3. Jobs/Stages/SQL - Unlock Performance With Spark DAG Mastery video (th-cam.com/video/O_45zAz1OGk/w-d-xo.html)
@seenu0104 4 หลายเดือนก่อน
Thank you very much for this amazing content with super easy explanation 👏👏
@afaqueahmad7117 4 หลายเดือนก่อน
Thank you @seenu0104, really appreciate it :)
@chitransh847 2 หลายเดือนก่อน
sir can you please bring python and sql series for prep of interviews and also basics of it , remaining of the content is just great!
@afaqueahmad7117 หลายเดือนก่อน
Thank you, appreciate it @chitransh847, Python coming soon :)
@saineelkiranch9790 4 หลายเดือนก่อน
Excellent. Very Well Explained
@afaqueahmad7117 4 หลายเดือนก่อน
Thank you @saineelkiranch9790, really appreciate it :)
@dataterre 4 หลายเดือนก่อน
Thanks Afaque, this is an excellent video to start my Saturday morning. It has been on my list to do for the whole week. A couple of questions for you / community since this is very relevant to my current work.
1) Considering we are "exhausting" the cluster resources, could you explain where does driver node come into the picture in this pool of resources (e.g. --driver-memory)? I presume a sizeable amount of driver memory is required since we tend to collect data in the driver node in a count(), etc.
2) Understand the concept of optimal executor sizing here. Suppose my application abstraction is looking at optimal Spark sessions running in parallel, then this optimal tuning here would mean I can only run 1 spark-submit job in the entire cluster, right?
Excellent video, again
@afaqueahmad7117 4 หลายเดือนก่อน
Hi @dataterre, thank you for the kind words, means a lot to me :) On the questions:
1. Indeed, a reasonable amount of cores and memory is required for the driver because it is the one coordinating the lifecycle of the application, managing communication, creating and scheduling tasks to be executed on executors. However, in this video, with specific focus being on "executor" tuning, driver resource allocation is skipped, but it's important to note (as you rightly pointed out) - driver will need resources for it's own functioning / executing it's responsibilities + collecting data as a result of actions (count(), show() etc..). I would think of subtracting out an appropriate number for driver cores and memory from the total cluster cores/memory and then doing the executor sizing discussed in the video.
2. Yes, this example assumes, you're taking up the whole cluster for best utilization. However, if you're looking forward to running multiple Spark sessions in parallel, you could do the following:
a. Enable dynamic allocation (by setting `spark.dynamicAllocation.enabled` set to `true`) to allow each session to use resources.
b. Define a reasonable minimum and maximum number of executors per application (by using `spark.dynamicAllocation.minExecutors`, `spark.dynamicAllocation.maxExecutors`)
c. Adjust `spark.executor.cores` and `spark.executor.memory` using the principles/rules as discussed (in video), to ensure that each application gets enough resources to perform efficiently but not so much that it monopolizes cluster resources
@shaifalipal9415 19 วันที่ผ่านมา
@@afaqueahmad7117 So in this case if we have a data of 10 GB, how do we decide on what memory and cores assigning to driver ?
@leonardopetraglia6040 หลายเดือนก่อน
Correct me if I'm wrong, but these calculations consider the execution of only one job at a time. How do the calculations change when there are multiple jobs running in a cluster, as often happens?
@Amarjeet-fb3lk 3 หลายเดือนก่อน
Hi @Afaque
I watched this video previously ,and I am still watching many more videos that covers, spark memory management and reading articles on spark memory and partitions.
So here are some points that I have learnt.
1. Memory for each core should we 4 times of 128MB.
2. Total number of partitions should be , 4*no. Of cores.
But,
How should we decide the no. Of partitions,each partitions size, memory for each core.
Because, this things will change,according to our data.
So,can u answer this 3 questions?
Thanks.
@ComedyXRoad 4 หลายเดือนก่อน
thanks for the content and your efforts
@afaqueahmad7117 3 หลายเดือนก่อน
Thank you @ComedyXRoad, appreciate the kind words :)
@maheshmahadev9918 4 หลายเดือนก่อน
Great Explanation, thanks !! I have a question: Can you explain the basis for choosing these numbers? Is it based on the incoming data that needs to be processed? In that case, for the calculations in this video, what is the data size considered. Thanks again
@afaqueahmad7117 4 หลายเดือนก่อน
Hey @maheshmahadev9918, the numbers for the cluster (X Nodes, Y Cores, Z RAM) are for illustration and independent of the incoming data size. As discussed in 34:06, the reason why I'm not talking about incoming data sizes because that should be tailored based on the "Memory per core". The most granular unit of data is going to be a "partition", and as long the core has got enough memory to process that partition, things will run fine. Would suggest to re-watch 34:06 again, if unclear :)
@ajaydhanwani4571 หลายเดือนก่อน
sorry if I am asking very basic question, can we set executors per spark job or per spark cluster? Also how to set this up using coding examples and all
@ShubhamWakshe-e4c 2 หลายเดือนก่อน
you talked about yarn application master. is it driver which contain application master container right? means we are assigning driver memory as 1 gb. right?
@atifiu 4 หลายเดือนก่อน
Thanks Afaque for this video. Have question regarding task level and executor level parallelism. As per my understanding 1 partition = 1 task = 1 core/thread, so how task level parallelism is executed as 1 task will be assigned to only one core which means within a executor remaining 46 cores will not be utilized if number of tasks are say only 5.
@ShubhamWakshe-e4c 2 หลายเดือนก่อน
if we have already alloting 1 core and 1 gb ram for yarn/os deamons then why do we need to allot seperate 1 core and 1 gb or one executor for yarn resource manager?
@Wonderscope1 3 หลายเดือนก่อน
I really enjoy your videos. Thanks for sharing your knowledge.
I have a question about how you create these videos. It is an amazing way to create tutorial videos. Do you mind share what tools you use to make these videos?
Thanks
@afaqueahmad7117 3 หลายเดือนก่อน ⁺¹
Thank you @Wonderscope1, really appreciate it. I use Notion and Miro :)
@Wonderscope1 3 หลายเดือนก่อน
@@afaqueahmad7117 I am familiar with Notion as project managmeent tool I didn't know it can help with video production. I need to look into that. Thanks 😊
@afaqueahmad7117 3 หลายเดือนก่อน ⁺¹
Sorry I meant Notion for the code snippets. I use Ecamm Live for video production :)
@Wonderscope1 3 หลายเดือนก่อน
@@afaqueahmad7117 perfect that's what I was looking for . Thanks :)
@naveenreddybedadala 2 หลายเดือนก่อน
Will that final actual executor memory again split into user,reserve, unified, overhead memory??
@rohitdeshmukh7274 2 หลายเดือนก่อน
Very informative video. I have one question. I’m having databricks cluster and auto scaling is enabled. Will calculations change in that case?
@adusumillisudheer2772 2 หลายเดือนก่อน
same question to me also. when autoscaling is enabled. how it will tune up the workers and executors inside it.
@roshankumargupta46 4 หลายเดือนก่อน
Hi Afaque! Can you confirm if I'm wrong here. So if thin executors promote more parallelism than fat executors? Because in the case of a thin executor, the number of executors will be higher, resulting in more individual cores, which will eventually promote parallelism. Whereas in Fat executor, all cores will be consumed by Executors which may lead to wastage of resources.
@suresh.suthar.24 3 หลายเดือนก่อน
wonderfull explanation ahmad, i have one doubt like as in your example 23GB memory willl be assigned to each and every executor and then 10% will excluded for overhead memory so we will left with 20 GB memory for executor. So now this 20 GB memory is ON heap memory and this will be divided into reserved memory, storage memory, execution memory.
Am i wrong or right please reply i have asked this question to my seniors but they dont have answer for this.
Thank you in advance..!!
@afaqueahmad7117 3 หลายเดือนก่อน
Hey @SS1251, You're correct! The 20GB of memory is indeed on-heap memory and it will be divided respectively into reserved, storage, and execution memory. The memory defined through `--executor-memory` or `spark.executor.memory` is the one allocated to on-heap. You can refer this video to get a better understanding: th-cam.com/video/sXL1qgrPysg/w-d-xo.html :)
@tridipdas5445 13 วันที่ผ่านมา
What if the nodes are of unequal size?
@9666gaurav 7 วันที่ผ่านมา
Is this applicable to cloud platform?
@satheeshkumar2149 4 หลายเดือนก่อน
How much of memory or core should we set aside for the internal stuff if we have got a standalone cluster instead of YARN ?
@swapnilpatil18 3 หลายเดือนก่อน
Hi , in case of fat executor we assigned all 47 GB remaining to executor (1 GB for Hadoop yarn ops). In this case from where executor overhead memory will come from ??
@afaqueahmad7117 3 หลายเดือนก่อน
Hey @swapnilpatil18, Good question. In the initial parts of the video (before explaining the 4 rules to size an optimal executor), the goal to explain fat executors was to only point out that they take up a large portion of the memory on a node and that was the rationale for not separating out the respective parts i.e. overhead memory, AM memory.
However, you understanding is absolutely correct. The ideal calculation should involve subtraction of Max(384MB, 10 % 47GB) = Max(384MB, 4.7GB) = 4.7GB per executor before calculating the `--executor-memory`
@vikastangudu712 4 หลายเดือนก่อน
Great Video, Thanks for the Explanation,
But how would a fat exec improve Data Locality ?
A node can be broken into 11 exec or 1 exec, The HDFS storage or some other storage within the node is still the same for all the exec inside the node.
Data Locality talks about the storage not memory. Thus Fat/Thin --> No effect on Data Locality.
@rambabuposa5082 4 หลายเดือนก่อน ⁺¹
Because FAT executor have more memory, it can store more partitions of your dataset and not much shuffling of data is required, and also it increases data locality (i.e. most of its required partitions are stored within that FAT executor)
@afaqueahmad7117 4 หลายเดือนก่อน
Hey @vikastangudu712, you're correct in saying that data locality talks about "storage". However, what I'm referring to is that the interplay with "memory" becomes important once data is loaded in memory in the sense that "how much" amount of data can be processed without having go through the overhead of having to load data from disk again. Several operations are going to benefit from this "memory" locality.
In Spark, the best form of locality is `PROCESS_LOCAL` - which would mean that the data required for a task is present in the memory of the same JVM. Therefore, fat executors occupying most memory of the node would benefit in this case - given that the chances of data being present on the same JVM increases.
Hope this clarifies :)
@Amarjeet-fb3lk 3 หลายเดือนก่อน
Hi, I watched this video till end.
Very good explanation.
But, I have below doubts.
If no. of cores are 5 per executor,
At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core.
Suppose, that
My config is, 2 executor each with 5 core.
Now, how it will create 200 partitions if I do a group by operation?
There are 10 cores, and 200 partitions are required to store them, right?
How is that possible?
@afaqueahmad7117 3 หลายเดือนก่อน
Hi @Amarjeet-fb3lk, thanks again for the kind words. Regarding your question, you're right in stating that 1 partition will be processed by 1 core. Given the configuration you shared has 2 * 5 = 10 cores in total, it is not necessary for the number of cores to match the number of partitions exactly at any given moment. Spark will create 200 partitions during shuffle by default and it will manage the execution of those 200 partitions by scheduling the tasks in chunks based on resource availability - firstly 10 partitions assigning them to 10 cores and once those 10 cores are freed, then the remaining 10 and so on.. until all 200 partitions are processed.
@Amarjeet-fb3lk 3 หลายเดือนก่อน
@@afaqueahmad7117 thanks for your response Afaque. Learning and going deep into the topics, bringing me lots of doubts and questions.
Thanks for the answer,highly appreciate that.
@maheshh1695 4 หลายเดือนก่อน
Hi thanks for sharing the information
In fat executor case, since we have 5 nodes and each node is having only one executor , then number of cores should be 5*11 ie 55 cores right
@afaqueahmad7117 3 หลายเดือนก่อน
Hey @maheshh1695, total cores will be 55 while cores per node is 11
@iamkiri_ 4 หลายเดือนก่อน
Awesome :)
@afaqueahmad7117 4 หลายเดือนก่อน
Thank you @iamkiri_, really appreciate it :)
@muhammadzakiahmad8069 หลายเดือนก่อน
Please make one on AWE aswell
@afaqueahmad7117 23 วันที่ผ่านมา
You mean AWS?
@muhammadzakiahmad8069 23 วันที่ผ่านมา ⁺¹
@@afaqueahmad7117 Sorry it was supposed to be AQE ( Adaptive Query Execution).
@afaqueahmad7117 23 วันที่ผ่านมา
Complete details on AQE is here below :)
th-cam.com/video/bRjVa7MgsBM/w-d-xo.html
@muhammadzakiahmad8069 23 วันที่ผ่านมา
@@afaqueahmad7117 Thanks🌟
@wreckergta5470 4 หลายเดือนก่อน
Thanks
@afaqueahmad7117 4 หลายเดือนก่อน
Appreciate it, @wreckergta5470 :)
@rambabuposa5082 4 หลายเดือนก่อน
Hi @afaqueahmad7117
At 35.30 minutes, you were discussing about "Memory per core" which 4gb per core. If we have partitions with size of 128Mb or 256Mb with this 4gb per core configuration, is that mean any inefficient utilisation of resources (memory)? because one core can process upto 4gb but partition size is very less.
Do we need to reduce "Memory per core" size to get better performance and efficient utilisation of resources?
Many thanks
@afaqueahmad7117 4 หลายเดือนก่อน
Hey @rambabuposa5082, Good question! 4GB per core was for an example. If the partition sizes are 128MB or 256MB, then this would indeed be underutilising the cluster. You could reduce the memory per core giving some room for overhead (maybe 400MB per core for a 256MB partition), however, it's important to keep the 4 rules of the game as discussed in mind (e.g. keeping number of cores
@remedyiq8034 4 หลายเดือนก่อน ⁺¹
@@afaqueahmad7117 I want to add one point. You told that executions happen in execution memory, that is 60 % percent and 40 percent is user memory. So . 60 Percent of 20GB --> is 12 GB memory. Out of which 50 percent is for execution and 5- percent storage. Let's assume 50 percent is given to execution(static allocation). out of 12 GB, only 6 GB is for execution. As we have 5 cores per executor. therefore 6/5 === approximately 1.2 per portion of memory per core. Maximum partition size can be accommodated is 1.2 GB of partition. MY thought process is correct ????
@afaqueahmad7117 4 หลายเดือนก่อน ⁺¹
Copying the same answer as in the previous comment for the community :)
"""
Hi @remedyiq8034, this is a very valid point and thanks for highlighting this. You're absolutely right about ~1.2GB memory per core. My mind was referring to execution memory but I really appreciate your attention to the breakdown of the `--executor-memory` into its various components, which I should have explained more clearly before doing the memory per core calculation. I'll look into adding an info card to make this clear in the video. Thanks again for your sharp observation!
"""
@tushibhaque863 2 หลายเดือนก่อน
Thanks and please provide contact details .Also do you take classes?
@afaqueahmad7117 2 หลายเดือนก่อน
Hey @tushibhaque863, appreciate the kind words. At this moment, I do not take classes, but if you have any questions feel free to shoot an email or comment here in this thread :)
@ranvijaymehta 4 หลายเดือนก่อน
Thankyou
@afaqueahmad7117 4 หลายเดือนก่อน
Appreciate it, @ranvijaymehta :)

ต่อไป

เล่นอัตโนมัติ

Shuffle Partition Spark Optimization: 10x Faster!