Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code. www.learningjournal.guru/courses/
As you described, the role of Secondary Name Node is to regularly take the checkpoint at configured interval and update the on disc FS Image by applying the editlogs that were captured in the time window when it took last checkpoint. And to further reduce the restart time of Primary Name Node, it does the same checkpoint process where it reads the on disc FS Image stored by SNN and apply the editlogs entry to create latest FS Image and store it in memory. Few questions wrt these : - 1. Where does SNN stores the FS Image. Is it inside disc on local file system ? 2. How does primary name node get access to that Secondary NN ?
Really nice explanation. If you can start practical implementation of one POC with end to end project , it will be very useful for all of us. Thanks for your efforts and time.
Very good Tutorial. Only thing I want to say is fsimage is not only in memory but also stored on disk. Please excuse me if I am not correct on this point.
Explanation was clear. I have few questions ? 1)while setting cluster using Hadoop 2,Initially how will zookeeper elects the leader among the namenodes? 2)Can you explain the funcitonality of failcontrollers of namenode?
Bro i would love to answer When u setup a new cluster the NN will be the active NN which u have selected to be a NN AND Later if it fails the zkfc(zookeeper failover controller ) is responsible for making standby node as a active node Hope this will help u
When u set up a new cluster the active namenode will be the one which you selected and if NN goes down the zookeeper will work here the demand of zookeeper ZKFC which stands for zookeeper failover and it is responsible for making standby namenode active namenode
Thank you for this excellent tutorial. I am new to this topic and all the tutorials or blogs I went through, did not put up a clear picture of what is happening with Checkpoint process of SNN and that of NN too. So, can you please confirm my understanding about this topic (Related to NON HA mode) ?... 1) After every Checkpoint run, SNN clears the Edit Log on Name Node as well? So at any time, Edit log on NN has data only since the last Checkpoint run on SNN. 2) fsimage of the NN gets updated automatically in real time (i.e as and when changes are made to the file system). Which means , Name Node always has latest fsimage in its memory at all times. 3) At any given time fsimage on the Secondary Name Node holds file system image updated as of last Checkpoint run. 4) After a reboot, Name Node picks up the fsimage from the "Secondary Name Node" and the Edit Log from NN local disc and merges them to create new fsimage file which is up to date with all changes as of then.
Why cant we dump the fsimage directly to disk during restarting of the NameNode . After restarting it can read the fsimage and then push it to memory it will be faster.
Highly recommended for anyone who wishes to learn about how fault tolerance is managed in HDFS. In addition to this, I've a question: Are block recovery, lease recovery and pipeline recovery done in addition to the methods describe in video for fault tolerance or these are done at deeper level of the described methods?
Great work, I have 2 questions. -Regarding the checkpoint activity does the secondary NN keeps the "on Disk FS" Image on it's local HD or is it on the Active NN HD ? -and the hour between each checkpoint is it configurable?
Sir, What will happen, if the DN-1 is slow, and it does not send heartbeat as fast as compared to other nodes. If NN then thought that DN-1 is down and started replicating the data on different node say DN-2 and during replicating the data the DN-1's heartbeat reached to NN. Will it stop replicating the data on DN-2?
+Pranav Wagde, I think it is hypothetical question. Either I get the heartbeat within expected interval or I don't. There is no concept of slow heartbeat. If NN realized that the block is under replicated, it will make more replicas to fix it. There is no concept of stopping in between. Later when NN realizes that block is over replicated, it will fix that also by throwing away some replicas.
Want to learn more Big Data Technology courses. You can get lifetime access to our courses on the Udemy platform. Visit the below link for Discounts and Coupon Code.
www.learningjournal.guru/courses/
Nothing is tough when you have a good teacher. Kudos for your work sir.
No one, i repeat no one has explained hadoop with this perfection. A million thanks
The best explanation of standby node in the Internet!!
Crisp, Simple and Picture is what called as best teaching. You are a best tutor.
I have gone through so many tutorials but the way you explained sir makes it so easy to understand hadoop. Thanks a lot sir!!
I have become a fan of your style of teaching. Thank you, sir. 😊
Really thank you for such topics,I spent a lot of time reading books but I couldn't understand anything till I watched your tutorials. big thanks
You make things very simple to understand..... Hats off to your effort !!
As you described, the role of Secondary Name Node is to regularly take the checkpoint at configured interval and update the on disc FS Image by applying the editlogs that were captured in the time window when it took last checkpoint. And to further reduce the restart time of Primary Name Node, it does the same checkpoint process where it reads the on disc FS Image stored by SNN and apply the editlogs entry to create latest FS Image and store it in memory. Few questions wrt these : -
1. Where does SNN stores the FS Image. Is it inside disc on local file system ?
2. How does primary name node get access to that Secondary NN ?
Excellent explanation Sir, Hat's off.
You are the best teacher.. Thanks a lot
i learn HDFS from last 7 days but still my concepts are not clear..but today i watched your video i am clear with everything...thank you
What an Explanation 🙏🙏🙏🙏🙏🙏❤️❤️❤️❤️❤️❤️❤️
Your explanation is very clear thank you. Kindly keep update the new videos.
Great Work sir. Thanx for video.
Good tutorial. Thank you for your efforts
Great explanation, thanks for your efforts :)
Very good tutorial. Easy to understand.
This was beautiful! Thank you.
Presentation and explanation was excellent..
Thanks. It's very clear. Piece of advice for viewers: These tutorials can easily be watched in 2x speed.
Awesome Sir ..Thank You
Awesome sir...great explanation👌👌
It was very good information.
Really nice explanation. If you can start practical implementation of one POC with end to end project , it will be very useful for all of us. Thanks for your efforts and time.
Very good Tutorial. Only thing I want to say is fsimage is not only in memory but also stored on disk. Please excuse me if I am not correct on this point.
Thanks for your clear explanation. Awesome!
Thanks for the detailed explanation.
Simple and superb explained
Very nice explanation!
Explanation was clear.
I have few questions ?
1)while setting cluster using Hadoop 2,Initially how will zookeeper elects the leader among the namenodes?
2)Can you explain the funcitonality of failcontrollers of namenode?
Bro i would love to answer
When u setup a new cluster the NN will be the active NN which u have selected to be a NN
AND
Later if it fails the zkfc(zookeeper failover controller ) is responsible for making standby node as a active node
Hope this will help u
When u set up a new cluster the active namenode will be the one which you selected and if NN goes down the zookeeper will work here the demand of zookeeper ZKFC which stands for zookeeper failover and it is responsible for making standby namenode active namenode
very useful explaination
Very nicely explained.
very nice.I could not understand too much about secondary name node but will try to understand it.
Why? is it because the explanation is not clear? You can ask your doubts if there are any?
no explantn is so nice but my fsimages nd editlog is not clear so
nd thank u very much
Great Tutorial..thanks for sharing
Hello! Is there any ppt format of this video? Need to explain students.. the representation is superb
Thank you for this excellent tutorial. I am new to this topic and all the tutorials or blogs I went through, did not put up a clear picture of what is happening with Checkpoint process of SNN and that of NN too. So, can you please confirm my understanding about this topic (Related to NON HA mode) ?...
1) After every Checkpoint run, SNN clears the Edit Log on Name Node as well? So at any time, Edit log on NN has data only since the last Checkpoint run on SNN.
2) fsimage of the NN gets updated automatically in real time (i.e as and when changes are made to the file system). Which means , Name Node always has latest fsimage in its memory at all times.
3) At any given time fsimage on the Secondary Name Node holds file system image updated as of last Checkpoint run.
4) After a reboot, Name Node picks up the fsimage from the "Secondary Name Node" and the Edit Log from NN local disc and merges them to create new fsimage file which is up to date with all changes as of then.
great and clear explanation thanks.
Why cant we dump the fsimage directly to disk during restarting of the NameNode . After restarting it can read the fsimage and then push it to memory it will be faster.
Very informative. Thanks
Highly recommended for anyone who wishes to learn about how fault tolerance is managed in HDFS.
In addition to this, I've a question: Are block recovery, lease recovery and pipeline recovery done in addition to the methods describe in video for fault tolerance or these are done at deeper level of the described methods?
nice tutorial sir
can u make a video why RDD is immutable and what would have happened had it not been immutable
Good lecture.
Great work, I have 2 questions.
-Regarding the checkpoint activity does the secondary NN keeps the "on Disk FS" Image on it's local HD or is it on the Active NN HD ?
-and the hour between each checkpoint is it configurable?
Awesome tutorial
very well explained
Great explanation
very good video
Sir, What will happen, if the DN-1 is slow, and it does not send heartbeat as fast as compared to other nodes. If NN then thought that DN-1 is down and started replicating the data on different node say DN-2 and during replicating the data the DN-1's heartbeat reached to NN. Will it stop replicating the data on DN-2?
+Pranav Wagde, I think it is hypothetical question. Either I get the heartbeat within expected interval or I don't. There is no concept of slow heartbeat. If NN realized that the block is under replicated, it will make more replicas to fix it. There is no concept of stopping in between. Later when NN realizes that block is over replicated, it will fix that also by throwing away some replicas.
Thanks for the explanation. Understood the concept.
Superb...Thank you so much
Sir great explation sir. I have a dout sir 1)how to install cloudera without internet sir & and what is parcel method and packeges method.
Thanks. could you please explain how to create Cloudera cluster as now a days many clients are prefer cloudera instead of Hortonworks..
Excellent !
it was clear about topic thank you so much , can you show with example
nicely explained
Thank you sir
Why there's an odd no. Of JN 3 or 5??
What's the reason behind that
Nice One
Good.
1. zookeeper election
2. split-brain concepts
3. Hadoop 3, erasure coding and storage policies
Could you please explain all above
Thank you so much ...
Awesome
Can we have multiple replication factor for multiple tenants?
You can have it at the topic level and I guess all Tanents of the cluster are not going to share the topics. So, answer is a Yes.
please upload some hive and pig related videos ..
sure, maybe in a month.
Can we make a single node for both NameNode and as a Secondary NameNode..?
Yes, we can. However, we don't do it in production.
okk....thanks
helpful
how fsimage file and editlog file communicate each other?
fsimage will not communicate with editlog but during checkpointing process new fsimage will be created by merging old fsimage with new editlog
Hadoop is very fault tolerant. The only point of failure can be Maharashtra State Electricity Board.
Lol! You can keep backup in Inverters.. Its not costly.
1.75 x
Great explanation