Hadoop vs Spark | Lec-3 | In depth explanation
ฝัง
- เผยแพร่เมื่อ 3 ต.ค. 2024
- In this video I have talked about Apache spark vs hadoop. I have talked the difference in detail. If you have some doubt please shoot your questions in comment section.
Directly connect with me on:- topmate.io/man...
For more queries reach out to me on my below social media handle.
Follow me on LinkedIn:- / manish-kumar-373b86176
Follow Me On Instagram:- / competitive_gyan1
Follow me on Facebook:- / manish12340
My Second Channel -- / @competitivegyan1
Interview series Playlist:- • Interview Questions an...
My Gear:-
Rode Mic:-- amzn.to/3RekC7a
Boya M1 Mic-- amzn.to/3uW0nnn
Wireless Mic:-- amzn.to/3TqLRhE
Tripod1 -- amzn.to/4avjyF4
Tripod2:-- amzn.to/46Y3QPu
camera1:-- amzn.to/3GIQlsE
camera2:-- amzn.to/46X190P
Pentab (Medium size):-- amzn.to/3RgMszQ (Recommended)
Pentab (Small size):-- amzn.to/3RpmIS0
Mobile:-- amzn.to/47Y8oa4 ( Aapko ye bilkul nahi lena hai)
Laptop -- amzn.to/3Ns5Okj
Mouse+keyboard combo -- amzn.to/3Ro6GYl
21 inch Monitor-- amzn.to/3TvCE7E
27 inch Monitor-- amzn.to/47QzXlA
iPad Pencil:-- amzn.to/4aiJxiG
iPad 9th Generation:-- amzn.to/470I11X
Boom Arm/Swing Arm:-- amzn.to/48eH2we
My PC Components:-
intel i7 Processor:-- amzn.to/47Svdfe
G.Skill RAM:-- amzn.to/47VFffI
Samsung SSD:-- amzn.to/3uVSE8W
WD blue HDD:-- amzn.to/47Y91QY
RTX 3060Ti Graphic card:- amzn.to/3tdLDjn
Gigabyte Motherboard:-- amzn.to/3RFUTGl
O11 Dynamic Cabinet:-- amzn.to/4avkgSK
Liquid cooler:-- amzn.to/472S8mS
Antec Prizm FAN:-- amzn.to/48ey4Pj
Manish bhai, kya gajab admi ho yrr tum, content aur knowledge bohot kamal hai, thankyou for the videos
Aag laga diya sir ji aapne maine phle spark complete kr rkha h pr itna deeply aaj sikne ko mila mujhe aapke channel se..overwhelming content.🙂🙂
I have been following your channel for long time. I love your content. I am preparing for data for data engineering. And these videos are helping me very much. Thank you so much.
Attendance Marked
I was doing a course on coursera that was boring and hard to understand then i got to know about your playlist. Bro your videos are damn good.
Right Class for individual. For beginners.❤❤❤
Thank you.
Bhai bahut Badhiya explain kiya Hai... excellent thanks
Really Appreciated. I like the content.
I am watching two hours before my university exams! All clearly i can understand! Hatts off man
thanx bhai, sachme bht understand hua video dekhkr
\
Best playlist over internet
Manish brother, our content is really awesome.
Feeling lucky to find your channel.
Outstanding... Keep it up. A very good and short informative videos. make more videos with more details.
highly recommended for all
well explained . Thanks for consistent videos
Namastey Sir,
Time 21:00 explanation me ek doubt hai
Fault Tolerance jo HDFS me hota hai wo cluster level par hota hai, in-case koi node fail ho gaya tab recovery hota hai aur ye recovery master node karti hai.
Lekin Spark to ek Compute engine hai, aur yadi storage HDFS hi ho aur yaha pe ek node fail ho jata hai to yaha pe bhi data-recovery to waise hi hoga jaise Hadoop Ecosystem me hota tha, fir DAG Spark me Fault-Tolerance ka kaam kaise kiya, Jitna mujhe samajh aa raha hai, DAG to data ko re-compute karega lekin ye nahi samajh aa raha hai ki under what circumstances Spark will have to use DAG to re-compute/re-process something. Please explain if you have any example/use-case
i watched it 3 times.....awesome video
Bhaiya plz is series ko age lekr jaiye .
Me bhut dino se ye sikhna chta tha and apki video bhut mstt hh ..
Sure
thanks for explaining WHYs! very helpful!
Great content Manish bhai, really good comparison, good points!
Thanks!
Amazing content Manish bhaiya 🙌.. Looking forward to more such exciting and knowledgeable video content.....
Very detailed yaar.... Thanks
Thank you Manish, started following you lately ... Amazing content .. Keep up the good work
sir, pyspark ka full syllabus wala ek playlist banayi ye na plz
Thankyou Bhai
Great work, thank you👌🙌
Brilliant explanations!
Are you a fellow data engineering aspirant ?
You are the best.
Why we will use Hive, if we have already Spark in our project, Any specific reason ?
Pyspark or spark same h kya.
Iss video s m pyspark sekh sakta hu ya nhi
Directly connect with me on:- topmate.io/manish_kumar25
bro aap roz video upload karo humari consistency banni rahi gayi
bhaiya Hadoop me fault tolerance to kewal storage level pe hoti hai na , application level pe fault tolerence nahi hota naa,,correct me if I am wrong
What in what ? Data storage or processing ?
GOOD VIDEO🤟
thank you brother
Marking my attendance 🙏
Manish hadoop was developed by former yahoo developer Doug Cutting not by google
thank you sir
i think the title should be Mapreduce vs Spark.. hadoop me dono use kr hi sakte h na..
Yes it should be map reduce vs spark. But the term Hadoop vs spark is more popular
@@manish_kumar_1 Dont stick with the popularity stick with the concept. to avoid confusions
Bhai, I have question related to DAG. If process 3 get failed, then DAG knows the steps to generate the information of process 3. What happens when process 1 gets failed? how DAG recover forms it? and what is process?
If "process 1" fails in the DAG, the recovery would typically involve retrying or restarting "process 1" itself. The success of this recovery depends on whether "process 1" is independent or has dependencies. If it has dependencies, those may need to be reprocessed as well to ensure a consistent state in the workflow. Essentially, DAG recovery for a failed process involves identifying the failure point, addressing it, and potentially rerunning dependent processes to maintain the integrity of the workflow.
Thanks
ChatGPT
Let me elaborate it:
A Directed Acyclic Graph (DAG) in Spark represents a computational workflow where nodes denote tasks or operations, and directed edges illustrate dependencies between these tasks. In the context of fault tolerance, if a task like "process 1" fails, the DAG aids recovery by re-executing the failed task based on information collected from its dependencies, ensuring the computational flow continues.
Consider a scenario where you apply five transformations to a DataFrame (DF). Each transformation creates a new DF as DFs are immutable. If, for instance, "transformation 4" fails during execution, Spark retrieves information from "transformation 3's" DF (its dependency) and then re-executes "transformation 4."
Regarding your question about "process 1" failure, if it fails, recovery involves restarting "process 1." Given interdependencies between tasks, subsequent transformations won't proceed if the initial process fails. The DAG orchestrates this recovery process by ensuring the restarting of the failed task, allowing the entire workflow to progress seamlessly.
If I am wrong then please someone let me know because I am also beginner in Data domain.
@@soumyaranjanrout2843 thanks for details explainantion.
What is the meaning of "Given interdependencies between tasks, subsequent transformations won't proceed if the initial process fails."?
@@TheBest-yh1yj In simpler terms, if one step in a process fails, the following steps that depend on it also get stuck until the initial issue is resolved. If I will simplify it more then as we knew every tasks are interdependent so if task 1 got failed(as per your question) then the remaining tasks that rely on its output cannot continue until the initial task is successfully completed. Hope you understood it😊
Thanks bhai
Hello Manish bhaiyya, I have two year experience in service based company on web development and I wanted to switch into data engineering profile I learnt SQL and learning python after watching your video and my company do not change role internally so how to switch into data engineering role pls answer this pls
Watch one of my titled " How I bagged 12 offers "
I don't want to code. Can I learn data engineering or should I go for Devops engineering?
Thanks for the detailed explaination !!
Minor Correction Hadoop is created by Yahoo! not google
Hi Manish can you also make a video on spark related project which could be useful for aspiring data scientists also just like the one you have created for data engineering specific.
Thanks in advance!
Will try
@@manish_kumar_1 thanks
Bhaia, 128MB to default size hota hai na block storage ka jo ki hum customise kr skte hai apne jarurat ke hisab se. To mera sawal ye tha ki, kis case me ye block storage ka size hum decrease krte hai aur kis case me increase krte hain?
If we have many smaller size disk blocks, the seek time would be maximum (time spent to seek/look for an information). And also, having multiple small sized blocks is the burden on name node/master, as ultimately the name node stores metadata, so it has to save this disk block information.
@@manish_kumar_1 Thank you :)
thnaks
Hi Manish, how do you make such notes in onenote? What stylus/device is required for this? I want to purchase a similar device for digital note-taking. Please advise.
Pentab is required to write it on notebook or ppt. You can buy online. I have medium size one. You can find the link in description
@@manish_kumar_1 Is it the iPad pencil you're referring to? Will wacom one pen tablet work the same?
@@reachrishav yes but it won't have any screen. You will get a pad and stylus. You have write on pentab with stylus but what ever you are writing will be shown in laptop one note or ppt or any other software that you are using
@@manish_kumar_1 Thanks. I guess you are using wacom tablet/stylus for this video?
bhaiya hadoop bhi padhna hoga kya ya spark chalega
Hadoop me hdfs padh lijiye and yarn. MapReduce ki zarurat nahi hai
@@manish_kumar_1 ha utna dekha hu bhaiya vo great learning se tabhi kaise slow hai mapreduce samjha mai
done
Manish sir ,
Ap Data Engineer ka course ya tutorial videos provide kara sakte ho kya ??
Agar kara skte please provide me link so that i will buy the tutorials or course ??
Free me hi padhata hu. Aap Mera 12 offer wala video dekh lijiye. Saare free resources mil jayenge
🔥🙇🏻🙏🏻
Kon sa course hai ye
Spark
@@manish_kumar_1 kis liye hai yeh
@@abidkhan.10 for Data Engineer roles
@@manish_kumar_1 bro daily upload videos of this series
Bro Hadoop made by Yahoo engineers not Google
Bro iska lecture ka baki content sunno!! Interviewer voh puchega!!! Jo correct karva re voh nhi😅