25. Databricks | Spark | Broadcast Variable| Interview Question | Performance Tuning

Raja's Data Engineering

มุมมอง 29 655

452

เพิ่มลงใน
- เพลย์ลิสต์ของฉัน
- ดูภายหลัง
แชร์

แชร์

ฝัง

ขนาดวิดีโอ:

แสดงแผงควบคุมโปรแกรมเล่น

เล่นอัตโนมัติ

เล่นใหม่

เผยแพร่เมื่อ 4 ก.พ. 2025
#BroadcastVariable, #DatabricksOptimization, #SparkOptimization, #Broadcast, #DatabricksInterviewQuestions, #SparkInterviewQuestions, #DatabricksInterview, #DatabricksPerformance,
#Databricks, #DatabricksTutorial, #AzureDatabricks
#Databricks
#Pyspark
#Spark
#AzureDatabricks
#AzureADF
#Databricks #LearnPyspark #LearnDataBRicks #DataBricksTutorial
databricks spark tutorial
databricks tutorial
databricks azure
databricks notebook tutorial
databricks delta lake
databricks azure tutorial,
Databricks Tutorial for beginners,
azure Databricks tutorial
databricks tutorial,
databricks community edition,
databricks community edition cluster creation,
databricks community edition tutorial
databricks community edition pyspark
databricks community edition cluster
databricks pyspark tutorial
databricks community edition tutorial
databricks spark certification
databricks cli
databricks tutorial for beginners
databricks interview questions
databricks azure

ความคิดเห็น • 50

@ririraman7 2 ปีที่แล้ว ⁺¹⁶
You should come in the top TH-camrs for Apache Spark PySpark tutorials. Awesome sir, brilliant. Thank You Thank You Thank You....
@rajasdataengineering7585 2 ปีที่แล้ว
Thanks Ramandeep!
@prabakaran-g5x 6 หลายเดือนก่อน ⁺²
A Passionate teacher,,,Hats off...Keep updating ...this is like contribution to Indians growth...Heart felt thanks
@rajasdataengineering7585 6 หลายเดือนก่อน
Thanks a ton!
@shakthimaan007 6 หลายเดือนก่อน ⁺²
Finally found one person who can explain Broadcast variable in a clear and understandable way.
Huge respect bro.
Subscribed and off I go to other videos in the playlist :)
@rajasdataengineering7585 6 หลายเดือนก่อน ⁺²
Thanks and welcome!
@shakthimaan007 6 หลายเดือนก่อน
@@rajasdataengineering7585 Do you have these notebooks saved somewhere in your git , etc
@sasuki479 17 วันที่ผ่านมา ⁺¹
Wonderful Raja!
@rajasdataengineering7585 17 วันที่ผ่านมา
Thank you! Keep watching
@sivagssri 3 ปีที่แล้ว ⁺¹
Good job... Keep posting interview questions on Databricks and Spark... I have shared your channel in my group.
@rajasdataengineering7585 3 ปีที่แล้ว
Thanks Siva...will post interview questions
@vigneshgaming6286 3 ปีที่แล้ว ⁺¹
Hi sir,will you training on pyspark
@irannamented9296 ปีที่แล้ว ⁺²
Very useful nice explanations.
@rajasdataengineering7585 ปีที่แล้ว
Glad it was helpful!
@deepjyotimitra1340 3 ปีที่แล้ว ⁺²
Thank you for your detailed video.
@kartikjaiswal8923 7 หลายเดือนก่อน ⁺¹
insightful and precise
@rajasdataengineering7585 7 หลายเดือนก่อน
Glad it is helpful! Thanks for your comment
@roshankumargupta46 3 ปีที่แล้ว ⁺²
Very useful..keep going!
@rajasdataengineering7585 3 ปีที่แล้ว
Thank you Roshan
@chessforevery1 ปีที่แล้ว ⁺¹
Great explained
@rajasdataengineering7585 ปีที่แล้ว
Glad it was helpful!
@swethakulkarni3563 ปีที่แล้ว ⁺¹
you are absolutely great!
@rajasdataengineering7585 ปีที่แล้ว
Thank you!
@RajBalaChauhan-b4w 2 หลายเดือนก่อน ⁺²
Thank you for such clarity. But I have a query - As Catalyst Optimizer will consider the broadcast join itself if a table is small enough to fit in memory, even if we haven't performed any broadcast join. So, is it really going to help us out in performance optimization? Or the performance will remain same only even after applying broadcast join?
@rajasdataengineering7585 2 หลายเดือนก่อน ⁺¹
Catalyst optimiser won't apply broadcast join by default. Either we need to apply manually or adaptive query execution needs to be enabled (AQE is enabled for recent spark versions)
@gulsahtanay2341 11 หลายเดือนก่อน ⁺¹
Good to know!
@vishalaaa1 ปีที่แล้ว ⁺¹
excellent
@rajasdataengineering7585 ปีที่แล้ว
Thank you! Cheers!
@chidellasrinivas 7 หลายเดือนก่อน
Hi Raja, i have few doubts. 1st Doubt - once data is cached in all worker nodes if there is any new records added to dim table. then do we need to broadcast again ?
2nd doubt - Once joining is completed can we clear data from each executors
@himanshuchourasia8936 ปีที่แล้ว ⁺⁴
Hi Raja, Could you please also make video on accumulator variable.
@rajasdataengineering7585 ปีที่แล้ว
Hi Himanshu, sure will make a video on accumulator
@nithinkatla-w6c 4 หลายเดือนก่อน ⁺²
sir, have a doubt broast variable and broad cast join are different or same
@rajasdataengineering7585 4 หลายเดือนก่อน
Both are same
@sowmyakanduri-t8t 8 หลายเดือนก่อน
Hi Raja, it covers only broadcast join part not the broadcast variables part. Please include that part also.
@AmericaMuchatlu86 8 หลายเดือนก่อน
Thank you for your wonderful playlist on Apache Spark. Can you please help on the difference between broadcast variable's and broadcast joins. Both are same?
@rajasdataengineering7585 8 หลายเดือนก่อน
Yes both are same
@ElhamMirshekari 3 ปีที่แล้ว ⁺¹
Hi, thanks for the videos, can you explain about the checkpoints, what are they ? how they are useful in optimizations?
@rajasdataengineering7585 3 ปีที่แล้ว ⁺³
Checkpoint is mainly used in 2 places in spark. One is Spark optimization and another is Spark streaming.
Your question is related to spark optimization. It is quite similar to persist which stores the dataframe in disk. Only difference is persist would retain the lineage but checkpoint would remove the lineage once data is saved to disk
@ElhamMirshekari 3 ปีที่แล้ว ⁺¹
@@rajasdataengineering7585 Thank you ! Please go ahead and explain the checkpoint in streaming as well, I really appreciate it!
@rajasdataengineering7585 3 ปีที่แล้ว ⁺²
Checkpoint is a location in streaming where spark maintains the metadata about processed data such as offset etc.
So when there is a failure in streaming execution, spark can understand till which data it has already processed and from where it needs to resume
@sohelsayyad5572 ปีที่แล้ว ⁺¹
Hiii Raja, Good content !!
table is broadcasted nd stored on all nodes, but at what part of memory, is it on heap memory or off heap memory managed by OS ?
thank you
@rajasdataengineering7585 ปีที่แล้ว ⁺¹
Thanks Sohel!
Its stored within on-heap memory
@sohelsayyad5572 ปีที่แล้ว
@@rajasdataengineering7585 thanks Raja 👍
@sohelsayyad5572 ปีที่แล้ว
@@rajasdataengineering7585 IF we persist with storage level MEMORY_AND_DISK and offHeap.use enabled true. then data will spill to offHeap or directly to disk ?
Also that Data structure can't be split when its spilling somewhere. what does it mean.
I appreciate your response. thank you :)
@prathapganesh7021 10 หลายเดือนก่อน ⁺¹
Thank you
@rajasdataengineering7585 10 หลายเดือนก่อน
You're welcome
@rahamanabdul6388 3 ปีที่แล้ว ⁺²
Good Stuff. Can you please share or create a copy code in git so that we can use for our learning.
@rajasdataengineering7585 3 ปีที่แล้ว
Sure, will do.
@ADFTrainer ปีที่แล้ว ⁺¹
it would be great if u provide script

ต่อไป

เล่นอัตโนมัติ

26. Databricks | Spark | Adaptive Query Execution| Interview Question | Performance Tuning