I'd just like to say thanks for making this video so visually simple. That includes thanks for not including a load of stock footage of different groups of youngish people inspecting laptop screens together, and extra bonus thanks for not cutting out to 1.5 second clips from Marvel MCU movies every 15 seconds. It's actually refreshing to see a TH-cam video that's directly, consitently focused on communicating knowledge about its topic.
Hi, thank you very much for leaving such comment. I’m glad to hear that you appreciate the focus on the important stuff. :-) Apologies for the late reply.
DUUUUDE, I was not expecting to understand git in one video. There are still the specific commands to learn but this video created a solid start for me. Thank you!
Thanks a lot dude! Yeah, it’s a journey, but understanding internals is a very good start. It’s not strictly necessary but I personally I like knowing how tools that I use work under the hood.
It is super nice to have a short video to dive deep into the underlying principle of git. especially, the objects and how merge and rebase work. Thanks a lot for the video!
A great, concise and well animated explanation. Great job! PS: Guess you've been blessed by the youtube algorithm, and deservedly so. I'll watch your other videos too.
One of the best videos in explaining git imo! Explains everything extremely clearly!!! No confusing language, and mixing of different terminology. The only thing that is't crystal clear is the very last example of "cherry picking" when rebasing the feature branch onto the main branch. It isn't very clear from this example between which commits the diffs are taken. In the example you want to rebase commit F onto commit D by applying the diff between F and it's parent commit E (which also happens to be the 1:st commit of feature branch since moving off from main) onto commit D along with the diff between D and E. But you never mention what would happen if there were more commits in the feature branch between E and F. Is the rebase operation going to take the diff between the last commit of feature and the 1:st commit of feature, or the immediate parent of the feature? In your example those two possibilities are the same since there are no commits between the 1:st and last commit of the feature branch. Anyway, that's the only thing I could find that didn't make sense. Maybe I missed something and just need to watch it again.
Thank you! I've been using git for over a decade and you've finally made it clear! If I ran the internet, this video would be shown at the top of any "how to" search about git.
After years of using Git, I finally understand rebase =) However, I personally prefer merge but this is probably due to how I use branches. Thanks for great video!
Rebase is useful for tidying up private branches before making things public. Once a branch is public, rebasing is going to annoy people who try to copy your branches.
@@lawrencedoliveiro9104 Indeed, projects I have contributed to usually want things rebased to the latest commit on the main public branch. They also prefer squashed commits, so your changes over multiple commits on your private branch appear as a single commit for merging to the main branch. I also like it this way, because it means I can mess around to my heart's content in my private branch until I have something I am ready to show the world, and any embrassing mistakes in the commit history of my private branch don't need to be part of the public history.
Thank you for the comment. That’s great to hear. I have no doubt that you’ll end up using Git at some point in your career, and I’m glad to hear that you’ve found this video helpful.
Very good, please keep making videos. Very clear explanation. I don't know where I picked up this habit but, as a matter of practice, while in I rebase with pulling in everything new from into the branch. Run tests to ensure my feature changes work with state. Then if all good, I go into and merge in which is always straightforward because already has the updates in . It seems kind of pointless after watching your video.
Thanks :) Actually, if I understand your approach correctly then it sounds like that is indeed the correct way to use rebase. In the video, I omitted the step where feature branch is rebased on the main branch. I think that has to happen and we can then use merge with fast-forward to update the main branch. I didn't omit this intentionally, it was an oversight.
Very nice explanation of git! A real example of merging with some code and the command prompt would have been a nice addition for me, but I will try it by myself.
THX, Nikola - you earned yourself a new subscriber. I remember struggling keeping "some progress" available during "code rollback" back in the late 1980s. That was NOT an easy task when you had a 8.3 filename restriction on every file. And then jumping to Unix, every Unix its own flavour of oddities.
Thanks a lot Michael! That sounds tough. Fortunately, I didn't have to deal with such problems :) I've only used SVN prior to Git, which wasn't great but acceptable.
I like to call the common commit the Base (because that's what it is) Rebase immediately makes perfect sense at that point - you are taking a list of commits from one base to a new base; you are re-basing
6:54 If 2 commits make the same change to a file and no other changes and had the same commit message and author, do they get the same SHA1? Or in addition to change(s), is SHA1 based on timestamp and other factors so that even in this case the two commits get different SHA1s.
Great explanation!!! Subscribed! 6:25 The trees on lines 3, 9, and 15 are rooted at the repo root, right? Regardless of where in the directory structure the change(s) were made.
Apologies if you’ve already addressed this in the video, but when rebasing at 12:00, does it warn you if the changes from commit E and commit F will overwrite the changes made from commit C to commit D (main), since the feature branch isn’t aware of the changes in main from C onwards.
Hi, yes it does. It would result in a merge conflict. If you watch the part on how cherry picking is implemented you’ll see that it uses the same algorithm as merging, which ensures that changes are not lost.
Thank you, this was really helpful! Amazing to learn that one of my most used tools is based on foundational CS concepts like key-value stores and trees!
Thanks! At 6:05, read "previous: initial-prototype" instead of "previous: initial-project" for the 2nd entry. Question: how does it avoid SHA-1 collisions, even though the probability is low? It looks like it would have to check the whole database each time, then invent something to change the digest in case of collision.
Good catch! Yes, it should say "previous: initial-prototype" I haven't experiment with hash collisions myself, but I think that git won't do anything about collisions locally. This answer might be helpful: stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob
Very nice and simple explanation of the internals. It would be cool if you could produce a follow up video where you go even more in depth. For example you could explain what happens with the trees and objects when a user command is executed. And I guess there are many more things to explain. Also things like 7:24, you say there is no metadata but where is that stored?
Thank you. Yes, I will likely produce a follow up video on git, but maybe with more focus on network synchronisations. 7:24 gets me to recap so I couldn’t find what you are referring to, but my guess is that you mean metadata about blobs such as filename and metadata about commits? I have mentioned both in the video: - filenames are stored in tree objects - messages, author, etc are stored in commit objects Is this what you mean or did I misunderstand the comment?
Yes that was my question, I tried to go back and forth again but could not find where you mentioned these two details 😢, can you provide the timestamps?
Sure. I speak about filesnames being stored in git trees at 4:20 th-cam.com/video/RxHJdapz2p0/w-d-xo.html Specifically, the sentence saying "tree solves the problem of not having a filename associated with a blob". There's also a tree visualization that shows filenames associated with blob object IDs. I talk about commits at 5:40 th-cam.com/video/RxHJdapz2p0/w-d-xo.htmlsi=pYAahuiGe71jjSB_&t=338 I say that we store information about a single change into commits. Does this answer your question or is it maybe unclear in some way?
Okay thx but what about additional meta data of the filesystem like the bits for read write execute? I liked the summary you gave and I think it would have been perfect to add these details. Thx again, looking forward to more videos.
Excellent video I have no words to demonstrate how well it's presented, but I find one important aspect missing is how git shows Delta changes in file between two commits, i can imagine that it probably traverse between nodes of tree and then compute difference but I have seen git calculating diff in large mono repo project in fraction of mili second which sounds like there might be an area of exploration here
Thanks a lot for the comment. Take the following with the grain of salt because I haven't validated if git does this, but since Git Trees are essentially Merkle Trees, this is probably how it works. If you want to compare two commits, this is essentially the same as comparing two git trees / merkle trees. Each git tree node has an ID, which is computed from its contents (other git trees or blobs). The same git trees will have exactly the same IDs, so comparing them is very fast. The problem is when two git trees don't have the same contents, then we have to compare the contents of both trees (again, note that we only compare the IDs of sub-trees and files). I think this is actually very fast in practice because we never need to compare the actual contents a file unless we know they are different in both trees. Eventually, we will identify the set of files whose content is different in each git tree, and we can run the diff tool only for these files. Above is a bit of a braindump, but hopefully it makes sense. Let me know if it doesn't though, and I'll try to elaborate.
Yes, that’s correct. The main eventually becomes F’ but only after the feature (using merge fast-forward). I made a mistake in the animation where I was focused on the desired outcome.
Thanks. I might create a video that shows how git diff for a single file works. Thanks for the suggestion. I don't know if git does anything special to track renamed files. The renamed file will have the same content (unless the content was changed as well), so its ID will remain the same. However, the tree object containing the list of files will change.
So how are commits actually stored on the filesystem then? Is it a text file stored as an object with a similar format to http headers? What about branches? Are they symlinks to a "commit" object?
Commits are stored in the object database, same as blobs. I think it is a binary format, not text files. Note that the object database is just a bunch of files in git (in the .git/objects directory). I wouldn't say the format is similar to HTTP headers. I don't know the exact details of the binary format, but you can find out more here if you're curious: github.com/git/git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415/commit.h#L26. You may need to trace how the commit struct is constructed to find out the serialization format. Branches are saved as files under ./git/refs/heads/. These are not symlinks. The file for the corresponding branch contains the commit ID it points to.
Nikola, in the Object Database, the 'Key' column contains SHA1 keys, so far so good. What does the 'Object' column contain? The actual object contents, as shown in 1:50? Or merely the paths to object files? I ask because at 2:37 you did say, "The contents of each object are stored in a file." That sounds like the contents are not themselves in the database. Is perhaps the Object Database composed of just one column and not two as shown? Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual.
Hi Dario! I think the confusion comes from the fact that the object database is not a table. I use a table-like view to visualize the properties of the database, but in practice, the object database is just a bunch of files. The key is the filename (left column) while the object (right column) are the contents of the file. To answer your questions: > What does the 'Object' column contain? The actual object contents, as shown in 1:50? The "Object" column contains the actual object contents. > Is perhaps the Object Database composed of just one column and not two as shown? I tried to explain above, but let me know if it still doesn't make sense.
> Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual. Correct, graphics are only conceptual.
So git merge combines the latest commits in two branches, but rebase starts by combining the first two diverging commits after the common branch, and from there it takes a sequential path
10:10 maybe this could do with some example commit messages, and example code? it might be easier to follow along if its presented in a real world scenario
I have a question. Even though git creates subdirectories to store the files, if it needs to create more subdirectories for a large project it still cannot exceed the folder limit which is 65K , right?
Hi Rafsan, FAT32 uses one index per folder which can reference around 65k files or folders (the actual number is more subtle because it depends on the length of the file/folder names, but this is a good approximation). Each folder has its own index, so you could store ~65k files/folders per folder. This means that you could have a much bigger number of folders in total by organizing them into subfolders, in a tree-like structure. You are right that there is an overall hard limit for the number of files/folders on FAT32 file system, but this number is much bigger than 65k, I think it's around ~250 million files.
Svaka cast, odican video! Mozda bi bilo kul i da si spomenuo kako git pronalazi razlike izmedju 2 grane pri mergu. Nisam zalzio konkretno u git, ali ima mi smisla da se preko Merkle stabla pronalazi koji fajlovi se razlikuju, a razlike izmedju fajlova sa LCS?
Hvala! :-) Slazem se da bi bilo dobro, mada sam negde morao da presecem. Mozda napravim novi video o gitu u nekom trenutku. Da, preko merkle stabla mozemo da pronadjemo razlicite fajlove jako brzo. Diff izmedju dva fajla je baziran na LCSu (po linijama - Myers algorithm), mada mislim da ima par algoritama i kod gita je cini mi se to configurable.
If you have a large file (say several megabytes or even gigabytes) and you change one bit in the file, does it save the whole new file in the blob store or does git have any clever tricks to only store the changes?
I think git does something to improve the storage of blobs, but it's not smart to realise that the 99% of the blob is the same. My guess is that this is probably optimized for text files rather than random binaries. I don't know if Git LFS does something though, but I'd expect it to have some kind of optimizations that work better for binaries. With that said, I don't know exactly what Git does, but here I've tested the above as follows: - Create an empty git project (its size is negligible) - Create a 1GB file with random data `dd if=/dev/urandom of=sample.txt bs=1G count=1` - Commit changes. The repo size is ~2GB. ~1GB for the blob and 1GB for the working directory. (see [1]) - Change 1 byte in the sample.txt file. - Commit the change. The repo size is ~2.58GB. (see [2] for inspecting the blob) - One more time. The repo size is 3.11GB. [1] ~/test (main) $ git ls-tree 60914ef 100644 blob 1d2579e731b4de097bda567f86bcf70d4c9fb4c6 sample.txt ~/test (main) $ ll -h ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6 -r--r--r-- 1 Nikola 197121 1.1G Oct 2 23:27 ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6 [2] ~/test (main) $ ll -h .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae -r--r--r-- 1 Nikola 197121 543M Oct 2 23:21 .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae
This is a great video. It would be 1000x more useful if it illustrated each concept with corresponding git command - simple, e.g. commit, to advanced, e.g. rebase. Perhaps someone here could point me to a video where git concepts are matched to git commands, please?
At 8:37, the animation looks like a rebase, should't it be a new commit which contains all the changes, thanks for the clear explantion though, great vid!
good videos, but one question regarding rebase. Should the feature branch be updated, not the main one? It seems like you are rebasing feature branch on top of main, but moving main instead of feature
Thank you for the comment. That’s a good point. In practice, the feature branch would be updated first (using git rebase command), then the main branch would be moved forward (e.g. using git merge command). My focus was on the mechanics/idea of the rebase, and I forgot to include that step. Hopefully the explanation still makes sense.
I Think there is one error for rebase : when you do : git checkout feature && git rebase main after rebase, we have : - main branch points d - e becomes e' and its father is d - f becomes f' and its father is e' - feature branch no longer points to f, this one points to f' no any branch points to e and f e and f are not lost, you can find them when you do git reflog
Yes, thanks. That was a mistake in the animation that was pointed out in a few comments so far. You're right, main branch will point to d, feature branch will point to f' after `git rebase main`, then we have a final step to merge the main and the feature branches with `git merge` which simply apply the fast-forward merge and update main to point to f' as well. Regarding the: - e becomes e' and its father is d - f becomes f' and its father is e' e and f will remain unchanged, so I don't think it's accurate to say that e becomes e'. Do you mean e' will correspond to e (and f' to f)?
QUESTION : I tried myself and found commit is not a full snapshot but only changes which they know. create master branch file_1 created ( 1st commit) . Now checkout "feat" branch. Do a 2nd commit file_2 created. Do a 3rd commit file_3 created. Do a 4th commit file_4 created. Do a 5th commit file_5 created. Do a 6th commit file_6 created. Now "git switch master". git cherry-pick "7th commit hash" And boom.. In master you see only file_1 and after file_7 created. Not a full snapshot. ( I mean even 7th is latest commit in feat but it doesn't have full snapshot, just their changes what they remember at that time). That's why file_2 to file_6 are not comes in the master. Because we use cherry pick and not merge.
Hi, thanks for the question. Your experiment produces the expected results, i.e. cherry-picking a single commit will add the *diff* from that commit and the parent commit, not the whole commit. This is why you’re seeing this behaviour even though commits are full snapshots. I have explained this in the video - have you seen the part that explains how cherry-picking works?
@@TechWithNikola yes yes. Today I tried very hard and I found commit is a full snapshot. You are right about cherry-pick. Even when you Cherry-pick direct merge commit which have 2 parents, you have to specify using -m that which parent do you want to get diff from. NOTE : I create file_1 in master with first commit. file_2 - 2nd commit file_3 - 3rd commit file_4 - 4th commit file_5 - 5th commit Now 6th commit I changes some content in file_1. `git diff HEAD HEAD~1` I checked difference between latest commit and 2nd last latest. Even commit - 5th just only add file_5, It was showing file_1 content changed. That means the commit is full snapshot... in 5th commit ( full snapshot) the file_1 was as it is.. in 6th commit I changes file_1 content... And it will show everything where I am on 5th commit at that time to 6th commit full snapshots changes..even I didn't touched file_1 in my 5th commit. ( only file_5 added and rest of snapshot stays as it is) Wow man you are such a genius.
TH-cam recommendations brought me here, and I certainly am not disappointed. Thanks for an informative yet concise explanation!
I’m glad you’ve found it useful. If you have any suggestions for improvements please let me know.
I'd just like to say thanks for making this video so visually simple. That includes thanks for not including a load of stock footage of different groups of youngish people inspecting laptop screens together, and extra bonus thanks for not cutting out to 1.5 second clips from Marvel MCU movies every 15 seconds. It's actually refreshing to see a TH-cam video that's directly, consitently focused on communicating knowledge about its topic.
Hi, thank you very much for leaving such comment. I’m glad to hear that you appreciate the focus on the important stuff. :-)
Apologies for the late reply.
DUUUUDE, I was not expecting to understand git in one video. There are still the specific commands to learn but this video created a solid start for me. Thank you!
Thanks a lot dude! Yeah, it’s a journey, but understanding internals is a very good start. It’s not strictly necessary but I personally I like knowing how tools that I use work under the hood.
It’s always great to gain a deeper understanding of something I already use daily. Thank you for this! ❤
You’re welcome. I very happy that people find it useful! ❤️
It is super nice to have a short video to dive deep into the underlying principle of git. especially, the objects and how merge and rebase work. Thanks a lot for the video!
You're welcome. I'm glad you've liked it :)
One of the best videos about Git I've ever seen, great job
Thanks. This means a lot to me!
Glad I came across this video of yours, amazing quality in the narration. Keep doing more. Thanks for your efforts ❤👏
Thank you so much for your kind words and support. I'm thrilled that you've enjoyed it. Your encouragement means a lot to me! ❤
I've been using git for 15 years and I learned a lot. Thanks!
You're welcome. Great to hear that!
A great, concise and well animated explanation. Great job!
PS: Guess you've been blessed by the youtube algorithm, and deservedly so. I'll watch your other videos too.
Thank you!
One of the best videos in explaining git imo! Explains everything extremely clearly!!! No confusing language, and mixing of different terminology. The only thing that is't crystal clear is the very last example of "cherry picking" when rebasing the feature branch onto the main branch. It isn't very clear from this example between which commits the diffs are taken. In the example you want to rebase commit F onto commit D by applying the diff between F and it's parent commit E (which also happens to be the 1:st commit of feature branch since moving off from main) onto commit D along with the diff between D and E. But you never mention what would happen if there were more commits in the feature branch between E and F. Is the rebase operation going to take the diff between the last commit of feature and the 1:st commit of feature, or the immediate parent of the feature? In your example those two possibilities are the same since there are no commits between the 1:st and last commit of the feature branch.
Anyway, that's the only thing I could find that didn't make sense. Maybe I missed something and just need to watch it again.
Thank you! I've been using git for over a decade and you've finally made it clear!
If I ran the internet, this video would be shown at the top of any "how to" search about git.
This video made clear to me concepts many others have tried and failed to help me understand. Really well done thanks man!
Thanks, I'm so glad that you've found it helpful!
This video is amazing, thank you. Hope you get the recognition you deserve!
Thank you very much for the kind words!
Thank you!
This is both a simple introduction and an excellent explanation of how Git works.
You’re welcome! Glad to hear you’ve enjoyed it.
This is best video for comprehensive understanding how git work internally.
Thank you so much…
After years of using Git, I finally understand rebase =) However, I personally prefer merge but this is probably due to how I use branches. Thanks for great video!
You’re welcome. I’m glad you liked it!
Rebase is useful for tidying up private branches before making things public. Once a branch is public, rebasing is going to annoy people who try to copy your branches.
@@lawrencedoliveiro9104 Indeed, projects I have contributed to usually want things rebased to the latest commit on the main public branch. They also prefer squashed commits, so your changes over multiple commits on your private branch appear as a single commit for merging to the main branch.
I also like it this way, because it means I can mess around to my heart's content in my private branch until I have something I am ready to show the world, and any embrassing mistakes in the commit history of my private branch don't need to be part of the public history.
beautifully explained, thanks Nikola!
I have seen 2 videos from your channel and it's top-notch content❤
not many can explain things this beautifully...more content please
this is true gold. best explanation i have ever seen yet for GIT.
Thank you :)
you are a fantastic teacher sir. Don't know how i found you, but i love you
Super nice video. Thanks for your efforts both here and on the channel more widely!
Thank you for taking the time to comment Zebedee. I'm very happy to hear that you like my videos.
I've never used Git, but now I feel like I would have a much easier time learning it! Great video, and thanks for making it!
Thank you for the comment. That’s great to hear. I have no doubt that you’ll end up using Git at some point in your career, and I’m glad to hear that you’ve found this video helpful.
I never understood GIT or any version trackers before. I do now thanks to you awesome video. Wornder amd clear explanation. Thanks
Thanks a lot :) Glad you've liked it!
This channel is gold 🥇. Hope you keep this quality and make more video. I learn a lot from you. Thank you.
Thank you, and you’re welcome. I’m very happy to hear that you like my videos. I’ll do my best to keep and improve the quality.
I love your work so much. It is very informative and concise. It was a pleasure
Thank you so much! :)
Great video! Concise and simple explanation with enough details to understand what's going on under the hood.
Thanks!
Finally, a clear explanation! Thanks.
Glad it helped!
Wow the way you teach will change the understaing of what's under the hood for deep learners this worth a great like
Thanks❤🎉
Thank you. I’m really glad you think so ❤️
Very well made video, loved the quality ❤️. Keep it up man 💯.
Thanks a lot! ❤️
Thank, Pro. The best explanation I've ever seen. 1000 likes!!!
The explanation provided was excellent. I truly enjoyed the video!
Thank you Mina. I'm glad you've enjoyed it!
This was so interesting and informative. Thank you so much!
Thank you Nikola as other i too came from TH-cam recommendation. Excellent explanation. Subscribed to your channel. Nice work.
Thanks a lot Manick. Hope you’ll enjoy my future videos too!
finally found a video which made me understand the whole thing. Thank you!
You're welcome! I'm glad you've liked it.
Very good, please keep making videos. Very clear explanation. I don't know where I picked up this habit but, as a matter of practice, while in I rebase with pulling in everything new from into the branch. Run tests to ensure my feature changes work with state. Then if all good, I go into and merge in which is always straightforward because already has the updates in . It seems kind of pointless after watching your video.
Thanks :)
Actually, if I understand your approach correctly then it sounds like that is indeed the correct way to use rebase. In the video, I omitted the step where feature branch is rebased on the main branch. I think that has to happen and we can then use merge with fast-forward to update the main branch. I didn't omit this intentionally, it was an oversight.
Very nice explanation of git! A real example of merging with some code and the command prompt would have been a nice addition for me, but I will try it by myself.
Thanks. That’s a great suggestion.
I'm very surprised to see 'only' 700 subscribers. Keep it up!
Thanks a lot!
this is one of the best videos explaining git
the merge strategies are also really greatly depicted here
thank you for your work🎉
Thanks a lot for the kind words. I’m very happy to hear that you’ve liked it.
YES I'VE BEEN WAITING FOR AN ANIMATED VIDEO THE COVERS THE .git FOLDER! THANKS SO MUCH!
You're welcome! I'm glad you've liked it.
THX, Nikola - you earned yourself a new subscriber.
I remember struggling keeping "some progress" available during "code rollback" back in the late 1980s.
That was NOT an easy task when you had a 8.3 filename restriction on every file.
And then jumping to Unix, every Unix its own flavour of oddities.
Thanks a lot Michael!
That sounds tough. Fortunately, I didn't have to deal with such problems :) I've only used SVN prior to Git, which wasn't great but acceptable.
wow ya. youtube recommended me but i stayed for the whole video! cheers!
I’m glad you’ve enjoyed it. Cheers!
I like to call the common commit the Base (because that's what it is)
Rebase immediately makes perfect sense at that point - you are taking a list of commits from one base to a new base; you are re-basing
That makes sense, thanks.
Great video! Finally I feel like I am ready to rebase something... 🥳
Thank you! Happy rebasing 😀
6:54 If 2 commits make the same change to a file and no other changes and had the same commit message and author, do they get the same SHA1? Or in addition to change(s), is SHA1 based on timestamp and other factors so that even in this case the two commits get different SHA1s.
That’s an amazing video. I never understood git this way before
Thanks. Glad you liked it!
Great explanation!!! Subscribed!
6:25 The trees on lines 3, 9, and 15 are rooted at the repo root, right? Regardless of where in the directory structure the change(s) were made.
Very good explaination of git internals!
This was incredibly valuable, thank you!
You’re welcome!
This is what i need. Thanks for the great video
You’re welcome! 🙂
Recommendations brought me here too. I like the use of visuals and your details! Keep it up!
Thanks a lot! I’ll do my best to make sure future videos are even better.
Apologies if you’ve already addressed this in the video, but when rebasing at 12:00, does it warn you if the changes from commit E and commit F will overwrite the changes made from commit C to commit D (main), since the feature branch isn’t aware of the changes in main from C onwards.
Hi, yes it does. It would result in a merge conflict. If you watch the part on how cherry picking is implemented you’ll see that it uses the same algorithm as merging, which ensures that changes are not lost.
Thank you so much for this amazing video. Keep up the good work.
Thanks a lot!
Thank you for this, it was very insightful. Nice to know how things work.
Glad it was helpful!
This is high quality content. Thanks!
Thank you!
Thank you for the explanation.
Please make more videos like this.
You’re welcome. 🙂
I discovered a great channel.
I'm not so familiar with git and this helped me a lot.
So to better use git better I use single small files.
Really glad it helped!
Nice work Nikola!
Thank you!
Thank you, this was really helpful! Amazing to learn that one of my most used tools is based on foundational CS concepts like key-value stores and trees!
You’re welcome. I agree. Git is a good example for why data structures are important.
Thanks! At 6:05, read "previous: initial-prototype" instead of "previous: initial-project" for the 2nd entry. Question: how does it avoid SHA-1 collisions, even though the probability is low? It looks like it would have to check the whole database each time, then invent something to change the digest in case of collision.
Good catch! Yes, it should say "previous: initial-prototype"
I haven't experiment with hash collisions myself, but I think that git won't do anything about collisions locally.
This answer might be helpful: stackoverflow.com/questions/9392365/how-would-git-handle-a-sha-1-collision-on-a-blob
Very informative. Thanks!
Very nice and simple explanation of the internals. It would be cool if you could produce a follow up video where you go even more in depth. For example you could explain what happens with the trees and objects when a user command is executed. And I guess there are many more things to explain. Also things like 7:24, you say there is no metadata but where is that stored?
Thank you. Yes, I will likely produce a follow up video on git, but maybe with more focus on network synchronisations.
7:24 gets me to recap so I couldn’t find what you are referring to, but my guess is that you mean metadata about blobs such as filename and metadata about commits? I have mentioned both in the video:
- filenames are stored in tree objects
- messages, author, etc are stored in commit objects
Is this what you mean or did I misunderstand the comment?
Yes that was my question, I tried to go back and forth again but could not find where you mentioned these two details 😢, can you provide the timestamps?
Sure. I speak about filesnames being stored in git trees at 4:20 th-cam.com/video/RxHJdapz2p0/w-d-xo.html
Specifically, the sentence saying "tree solves the problem of not having a filename associated with a blob". There's also a tree visualization that shows filenames associated with blob object IDs.
I talk about commits at 5:40 th-cam.com/video/RxHJdapz2p0/w-d-xo.htmlsi=pYAahuiGe71jjSB_&t=338
I say that we store information about a single change into commits. Does this answer your question or is it maybe unclear in some way?
Okay thx but what about additional meta data of the filesystem like the bits for read write execute? I liked the summary you gave and I think it would have been perfect to add these details. Thx again, looking forward to more videos.
Excellent video I have no words to demonstrate how well it's presented, but I find one important aspect missing is how git shows Delta changes in file between two commits, i can imagine that it probably traverse between nodes of tree and then compute difference but I have seen git calculating diff in large mono repo project in fraction of mili second which sounds like there might be an area of exploration here
Thanks a lot for the comment.
Take the following with the grain of salt because I haven't validated if git does this, but since Git Trees are essentially Merkle Trees, this is probably how it works.
If you want to compare two commits, this is essentially the same as comparing two git trees / merkle trees. Each git tree node has an ID, which is computed from its contents (other git trees or blobs). The same git trees will have exactly the same IDs, so comparing them is very fast. The problem is when two git trees don't have the same contents, then we have to compare the contents of both trees (again, note that we only compare the IDs of sub-trees and files). I think this is actually very fast in practice because we never need to compare the actual contents a file unless we know they are different in both trees. Eventually, we will identify the set of files whose content is different in each git tree, and we can run the diff tool only for these files.
Above is a bit of a braindump, but hopefully it makes sense. Let me know if it doesn't though, and I'll try to elaborate.
absolutely amazing video on git!
Glad you liked it! :)
In the rebase example, it's not main which becomes F', but feature. And after rebase, feature has to be merged to main again.
Yes, that’s correct. The main eventually becomes F’ but only after the feature (using merge fast-forward). I made a mistake in the animation where I was focused on the desired outcome.
Wow this great explanation ❤. I suppose one of the tradeoffs of git is that there can be only 1 history.
Nice video! I would find it interesting to learn how git tracks renamed files and finds out which lines are changed
Thanks. I might create a video that shows how git diff for a single file works. Thanks for the suggestion.
I don't know if git does anything special to track renamed files. The renamed file will have the same content (unless the content was changed as well), so its ID will remain the same. However, the tree object containing the list of files will change.
Ngl !! It went all over my head, maybe cause English isnt my first language but great work 👍
very clean i did understood something. Thank you.
Glad you've found it useful!
You explained how git works wow!!! 🌟
Thank you for the comment. I'm glad it was helpful! :)
So how are commits actually stored on the filesystem then? Is it a text file stored as an object with a similar format to http headers? What about branches? Are they symlinks to a "commit" object?
Commits are stored in the object database, same as blobs. I think it is a binary format, not text files. Note that the object database is just a bunch of files in git (in the .git/objects directory).
I wouldn't say the format is similar to HTTP headers. I don't know the exact details of the binary format, but you can find out more here if you're curious: github.com/git/git/blob/d0e8084c65cbf949038ae4cc344ac2c2efd77415/commit.h#L26. You may need to trace how the commit struct is constructed to find out the serialization format.
Branches are saved as files under ./git/refs/heads/. These are not symlinks. The file for the corresponding branch contains the commit ID it points to.
Really well done! Subbed.
Thank you!
Nikola, in the Object Database, the 'Key' column contains SHA1 keys, so far so good. What does the 'Object' column contain? The actual object contents, as shown in 1:50? Or merely the paths to object files? I ask because at 2:37 you did say, "The contents of each object are stored in a file." That sounds like the contents are not themselves in the database. Is perhaps the Object Database composed of just one column and not two as shown?
Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual.
Hi Dario!
I think the confusion comes from the fact that the object database is not a table. I use a table-like view to visualize the properties of the database, but in practice, the object database is just a bunch of files. The key is the filename (left column) while the object (right column) are the contents of the file.
To answer your questions:
> What does the 'Object' column contain? The actual object contents, as shown in 1:50?
The "Object" column contains the actual object contents.
> Is perhaps the Object Database composed of just one column and not two as shown?
I tried to explain above, but let me know if it still doesn't make sense.
> Never mind, I've just googled it and found that there's also a filepaths reference store connected as though by a database primary key to the object store, and both the stores merely function like one database. Your graphics are only conceptual.
Correct, graphics are only conceptual.
@@TechWithNikola Wow, you're quick! :) Thanks for the answers. The video is awesome, I'm still watching it.
So git merge combines the latest commits in two branches, but rebase starts by combining the first two diverging commits after the common branch, and from there it takes a sequential path
Yes, that sounds right to me.
Minor correction: "after the common branch" -> "after the common commit"
10:10 maybe this could do with some example commit messages, and example code? it might be easier to follow along if its presented in a real world scenario
Thanks for the suggestion. Yeah, that may have been better.
I have a question. Even though git creates subdirectories to store the files, if it needs to create more subdirectories for a large project it still cannot exceed the folder limit which is 65K , right?
Hi Rafsan, FAT32 uses one index per folder which can reference around 65k files or folders (the actual number is more subtle because it depends on the length of the file/folder names, but this is a good approximation). Each folder has its own index, so you could store ~65k files/folders per folder. This means that you could have a much bigger number of folders in total by organizing them into subfolders, in a tree-like structure.
You are right that there is an overall hard limit for the number of files/folders on FAT32 file system, but this number is much bigger than 65k, I think it's around ~250 million files.
This is so good !
Thank you!
Svaka cast, odican video! Mozda bi bilo kul i da si spomenuo kako git pronalazi razlike izmedju 2 grane pri mergu. Nisam zalzio konkretno u git, ali ima mi smisla da se preko Merkle stabla pronalazi koji fajlovi se razlikuju, a razlike izmedju fajlova sa LCS?
Hvala! :-)
Slazem se da bi bilo dobro, mada sam negde morao da presecem. Mozda napravim novi video o gitu u nekom trenutku.
Da, preko merkle stabla mozemo da pronadjemo razlicite fajlove jako brzo. Diff izmedju dva fajla je baziran na LCSu (po linijama - Myers algorithm), mada mislim da ima par algoritama i kod gita je cini mi se to configurable.
If you have a large file (say several megabytes or even gigabytes) and you change one bit in the file, does it save the whole new file in the blob store or does git have any clever tricks to only store the changes?
I think git does something to improve the storage of blobs, but it's not smart to realise that the 99% of the blob is the same. My guess is that this is probably optimized for text files rather than random binaries. I don't know if Git LFS does something though, but I'd expect it to have some kind of optimizations that work better for binaries. With that said, I don't know exactly what Git does, but here I've tested the above as follows:
- Create an empty git project (its size is negligible)
- Create a 1GB file with random data `dd if=/dev/urandom of=sample.txt bs=1G count=1`
- Commit changes. The repo size is ~2GB. ~1GB for the blob and 1GB for the working directory. (see [1])
- Change 1 byte in the sample.txt file.
- Commit the change. The repo size is ~2.58GB. (see [2] for inspecting the blob)
- One more time. The repo size is 3.11GB.
[1]
~/test (main)
$ git ls-tree 60914ef
100644 blob 1d2579e731b4de097bda567f86bcf70d4c9fb4c6 sample.txt
~/test (main)
$ ll -h ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6
-r--r--r-- 1 Nikola 197121 1.1G Oct 2 23:27 ./.git/objects/1d/2579e731b4de097bda567f86bcf70d4c9fb4c6
[2]
~/test (main)
$ ll -h .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae
-r--r--r-- 1 Nikola 197121 543M Oct 2 23:21 .git/objects/81/c9a65ac724cfcd7afaa1986609de53368ef2ae
@@TechWithNikola That seems quite wasteful, but that could just be because it's optimized for speed, not storage size.
Very helpful. Thank you!
Glad you've found it useful!
Great explanation!
Thank you!
Good video. You sound exactly like Antti from "Road to Vostok" 👀
Great video! Thanks a lot
Glad you liked it!
Thanks a lot! That was amazing!
You’re welcome! 😀
Great video thank you!
This is a great video. It would be 1000x more useful if it illustrated each concept with corresponding git command - simple, e.g. commit, to advanced, e.g. rebase. Perhaps someone here could point me to a video where git concepts are matched to git commands, please?
Amazing video . 🙌🏻🔥💯💯♥️
At 8:37, the animation looks like a rebase, should't it be a new commit which contains all the changes, thanks for the clear explantion though, great vid!
good videos, but one question regarding rebase. Should the feature branch be updated, not the main one?
It seems like you are rebasing feature branch on top of main, but moving main instead of feature
Thank you for the comment. That’s a good point. In practice, the feature branch would be updated first (using git rebase command), then the main branch would be moved forward (e.g. using git merge command).
My focus was on the mechanics/idea of the rebase, and I forgot to include that step. Hopefully the explanation still makes sense.
this is a very good video
Thanks a lot!
Crisp and informativeZ@
Thank you!
I Think there is one error for rebase :
when you do :
git checkout feature && git rebase main
after rebase, we have :
- main branch points d
- e becomes e' and its father is d
- f becomes f' and its father is e'
- feature branch no longer points to f, this one points to f'
no any branch points to e and f
e and f are not lost, you can find them when you do
git reflog
Yes, thanks. That was a mistake in the animation that was pointed out in a few comments so far. You're right, main branch will point to d, feature branch will point to f' after `git rebase main`, then we have a final step to merge the main and the feature branches with `git merge` which simply apply the fast-forward merge and update main to point to f' as well.
Regarding the:
- e becomes e' and its father is d
- f becomes f' and its father is e'
e and f will remain unchanged, so I don't think it's accurate to say that e becomes e'. Do you mean e' will correspond to e (and f' to f)?
@@TechWithNikola Yes i mean e' contains the same thing as e. e' has not the same sha1 than e, this is why i call it e'
Thanks for sharing.
You’re welcome!
Thanks, much appreciated!
I’m glad you’ve found it useful!
Excelent job!
Thank you!
QUESTION :
I tried myself and found commit is not a full snapshot but only changes which they know.
create master branch file_1 created ( 1st commit) .
Now checkout "feat" branch.
Do a 2nd commit file_2 created.
Do a 3rd commit file_3 created.
Do a 4th commit file_4 created.
Do a 5th commit file_5 created.
Do a 6th commit file_6 created.
Now "git switch master".
git cherry-pick "7th commit hash"
And boom..
In master you see only file_1 and after file_7 created.
Not a full snapshot. ( I mean even 7th is latest commit in feat but it doesn't have full snapshot, just their changes what they remember at that time).
That's why file_2 to file_6 are not comes in the master.
Because we use cherry pick and not merge.
Hi, thanks for the question. Your experiment produces the expected results, i.e. cherry-picking a single commit will add the *diff* from that commit and the parent commit, not the whole commit. This is why you’re seeing this behaviour even though commits are full snapshots. I have explained this in the video - have you seen the part that explains how cherry-picking works?
@@TechWithNikola yes yes. Today I tried very hard and I found commit is a full snapshot. You are right about cherry-pick. Even when you Cherry-pick direct merge commit which have 2 parents, you have to specify using -m that which parent do you want to get diff from.
NOTE :
I create file_1 in master with first commit.
file_2 - 2nd commit
file_3 - 3rd commit
file_4 - 4th commit
file_5 - 5th commit
Now 6th commit I changes some content in file_1.
`git diff HEAD HEAD~1`
I checked difference between latest commit and 2nd last latest.
Even commit - 5th just only add file_5, It was showing file_1 content changed.
That means the commit is full snapshot...
in 5th commit ( full snapshot) the file_1 was as it is..
in 6th commit I changes file_1 content...
And it will show everything where I am on 5th commit at that time to 6th commit full snapshots changes..even I didn't touched file_1 in my 5th commit. ( only file_5 added and rest of snapshot stays as it is)
Wow man you are such a genius.
@@DhavalAhir10 that’s great. I’m glad that it all makes sense now!
Amazing content quality
Thank you
W wiezieniu z pewnoscia bylbys gitujacy 👌👌👌👌
hi. super nice colors here !
Where are you (not-UK) from ?
Hi, thanks! I'm from Serbia :)
What a video!
Thank you!