Thank you very much for the detailed explanation and it gave very good understanding on how these properties help in running the spark job. Really appreciate your help in educating the tech community 👏👏
Thank You Sir ! Namaskaaram !
Very useful video Anna. Thanks Much! Anna requesting to please make a video on the Real-Time project which is done in Industries as one video. Similarly, as a continuation make another video on, "what sort of question we get on that same real-time project in real-time interviews. Please Please Anna Please make a video on this. Thanks in advance.
superb explanation bro thanks a lot
Hi! great content! i'm wondering how yarn container vpu mem size works with executors.
Do executors themselves run in parallel in Spark, or is it just the tasks within them?
I have a 250gb file to process and I used dynamic allocation. when I try to run the job it is giving an error job got aborted due to stage failure. how do I fix this issue?
Clearly explained
Just Amazing
Can we say that cores are actual available threads in spark,
As core can run multiple tasks .
So its not always one core for one task.
A core can multitask.
Can you confirm this?
Thank you
This is great. Thanks!
Good explanation 👌
If no. of cores are 5 per executor,
At shuffle time, by default it creates 200 partitions,how that 200 partitions will be created,if no of cores are less, because 1 partition will be stored on 1 core.
Suppose, that
My config is, 2 executor each with 5 core.
Now, how it will create 200 partitions if I do a group by operation?
There are 10 cores, and 200 partitions are required to store them, right?
How is that possible?
you have a great teaching skill. Kudos!
Can we use sparksession on worker node. Facing issue with accessing spark session on worker nodes. Pls hp
Hi, is it possible to create multiple executors on my personal laptop having 6 cores and 16 gb RAM?
i have applay 4x memory in each core for 5Gb file but no luck can you please help me to how to resolve this issue
Road map:
1)Find the number of partition -->5GB(10240mb)/128mb=40
2)find the CPU cores for maximum parallelism -->40 cores for partition
3)find the maximum allowed CPU cores for each executor -->5 cores per executor for Yarn
4)number of executors=total cores/executor cores -> 40/5=8 executors
Amount of memory is required
Road map:
1)Find the partition size -> by default size is 128mb
2)assign a minimum of 4x memory for each core -> what is applay ???????
3)multiple it by executor cores to get executor memory ->????
Love yur content 😊
thanks
Nice explanation 😊✌️
nice explanation
What configuration will required for 250GB data?
How spark gets its metadata
Can you explain why does Spark spill to disk and what cause this? I understand that in wide transformation or groupbykey statement where data is too big to fit in memory then spark has no choice but to spill it to disk ; my question is if we can minimize this with any performance tuning like bucketing/mapside join,etc...
We can increase number of shuffle partitions and also we can adopt salting technique to increase no. of unique keys and increase cardinality to avoid skewness.
If none works we can increase executor cores or memory.
Please can you explain this video in Tamil. It will be very helpful for me. Thank you
Gold
Thank much it was really a simple and best explanation for those configs.