The folder “.git/log” is not for the command “git log” but for the command “git reflog”. timestamp 1:00 The command “git log” shows commit messages, which are stored in the “.git/objects” folder.
00:22 how to investigate .git folder 00:33 HEAD, etc 00:50 config folder 01:00 log folder 01:20 objects folder 01:33 what happens on git commit: (1) compress current files with zlib, generates names with sha1 01:48 3 types of objects: (1) commit, (2) folder view, (3) each file 02:00 show object type: git cat-file -t 02:44 show object content: git cat-file -p 03:06 show object tree: git cat-file -p 03:34 show file content: git cat file -p 04:01 summary: what is git object 04:15 naming in ./git/objects/ (optimization) 04:56 question: root access/security 06:09 question: how git handles conflicts 07:27 index folder: stores what's gonna be your next commit (staged) This table of content was created using "Smart Bookmarks for TH-cam" chrome extension.
To learn about the details, get an old version of git (v1.0.0 or v1.3.0 which doesn't require an old version of openssl's BIGNUM). All the commands were simple shell scripts that called simple c-programs that manipulated the index and created the blob, tree and commit objects. "git init" was git-init-db and it created the .git/ directory. "git add" was a shell script that used git-ls-files (a c-program) to get information about the current directory structure and used this information as input for git-update-index. "git commit" was a shell script that let you write a commit message and created the commit object and updated the HEAD reference. All these scripts and programs were small and easy to understand.
As I recall, one reason why a lot of that simplicity had to go away was because shell scripts and subprocesses do not work well under Microsoft Windows. I think this was about the time Mozilla were looking to move to one of these newfangled distributed VCS things (likely they were using Subversion before). But one important criterion was that it had to work well under Windows. Sadly, that ruled out Git (at the time). So they went with Mercurial instead.
4:55 NO! Git does not inherently have root access, it only has the permissions of the user which runs the git commands. If you're silly enough to use sudo with git then sure it'll have root access for each command you sudo, but generally it's just your permissions.
Right, came to see if anyone commented this yet - git works with the same ACLs on the folder as the user that is executing the commands, and that's it. Git has the same permissions as your normal user account, no more, no less.
yeah i wanted to point this out too. it was kind of a silly question because most programs don’t deal with permissions security at all. they simple do what they must and the OS manages security by stopping them/you when insufficient permissions arise. then you can choose to elevate permissions or back off. there’s really no silver bullet here but git (and most programs) doesn’t care what user it is running as.
@@caedenw I suspect he was thinking git runs as a service - it does not. That's why all git commands start with 'git ' - you're literally executing git at that point in time.
This is important. Most console commands run as the user who invoked them. Special commands can have their SID flag set, which changes who they run as, but most are gonna be as the user who invoked them.
@@faeranne yep, and generally you want as few as possible commands in your system to have that sid bit set (which means they are going to run as their owner, instead of the user who runs them), even if they have a very specific usage, because it's really hard to ensure your software can't be used to do something unintended, so they can be an important source of security issues.
7:50 Some people might be accustomed to other VCSes which automatically assume that every change you have made to your working copy is to be included in the next commit (e.g. Subversion works this way). Git doesn’t do this -- it requires you to explicitly _add_ changes to the “index” to be included in the commit. This is because Git recognizes that you often make multiple sets of changes -- for example, while fixing one thing, you notice something else that needs fixing, that is quite unrelated. So when you actually get round to committing your changes, you can pick and choose which ones go into the same commit, and which ones to leave for another separate commit. And some things, like debug message lines, can be left out altogether.
Some Git client software will allow you to commit changes you haven't added to the index manually. Also, some Subversion client software will let you check in subsets of the changed files in your workspace.
@@AdrianColley The git index file keeps a complete representation (hashes) of your checked out tree and of the staged tree. To speed up diffing and status command, it also keeps timestamp information of all checked out files. When you do "git add", git stores the contents of each file to be staged as a blob, and saves this blob hash in the index (there is an option to just record an intention to stage, without calculating the hash and storing the blob, but it's not normally used). When you make a commit, git calculates hashes of all updated (sub)trees in the index and writes those trees as new tree objects. The root tree hash then goes to the commit. By the way, you can see the whole history of Git development in its repo.
@@davidfrischknecht8261 The "original" Git command line client allows you to commit without adding to the index. Internally, it just makes a new temporary index file for that.
The recording of a terminal at 0:47 is a bit misleading. There's no need to use sudo when accessing a file to which you have read access (assuming this is the case as the file is within the "pi" home directory) and the path specified ("/Documents/git_test/.git/HEAD") in incorrect, it should be "~/Documents/git_test/.git/HEAD".
@@Computerphile You can use tab to autofill directories/files in the path and the "tree" command is helpful to show the folder structure of small-ish folders.
I was alarmed to see `sudo nano`. There's a technical name for that: a footgun. You should only use `sudo` when you're sure you need elevated privileges. In any case, `sudoedit` is a safer way to do `sudo $EDITOR` when it's needed.
He talks about sending changes between two folders via a server. I'd like to point out that git doesn't actually need/care-about servers; we can push/pull changes directly from one local folder to another if we like; we can make clones of a folder; and so on. I like to have a folder full of "pristine" repos which I pull and push commits to; whilst making and committing changes in separate "working copies" scattered around my computer as needed. We can also send changes back/forth to multiple servers; for example, I host copies of my git repos on my own server, but I also have copies on GitHub (which acts like a "mirror").
That sounds like an interesting way to keep the "official" commits clean and to the point, but you definitely "lose" or "hide" some of the development history, which can be just clutter depending on the quality of those internal committs, so maybe no real loss there.
I think it is quite common to clean up and simplify your private commit history before sending a bunch of patches to the central authorities for a project. For example, you probably don’t want them seeing every little typo correction or whitespace fixup as a separate commit.
@@ali_p_q7920 that is how you work with Git. You put some stuff in your local copy, clean up the history and push those to the server. The thing with GIt is that every user have an own copy of the repository. And when they want, they send the differences to some other user that can merge them into his code base. What GitLab/GitHub does is making room for storing those common disk places shared between users.
I also use Git for keeping backups of my personal documents and my wife's. Git is never intended for that purpose but it works actually great. Especially preventing overwriting newer documents with older versions which tends to happen if you do manual copying files to a backup server or drive.
@@AlexAegisOfficial Space wise there is a difference, since your git history also contains all older versions. Since it can't do deltas it contains copies of all older versions.
@@stt.9433 I guess that's per document and git manages a whole folder, multiple documents which is the most common usecase, you want related documents (or even all of them) in one place, plus the built in is probably garbage
In short, private forks and pull requests. Combined with CI builds to validate that it works. We merge about 100 PRs a day at my current project and so far it works fine 🙂
3:41 Tip: You can do “git show --raw «commit»” to get a list of the files affected by that commit. You can similarly list the files affected by each entry in the log with “git log --raw”.
It also shows the file renames, which Git is trying to guess. For those who don't know, in Git, a commit doesn't store information of file rename. A rename appears as a deleted file and an added file with the same or (not much) different contents.
It's actually pretty interesting that git always stores the whole file. I thought it would just store the diff of files and reconstruct the latest state by reapplying all changes. Instead they store the state of the tree. Learnt something new today. Thanks ❤
@@JonasBergling I always thought that the diff is very central in git. But apparently it's very easy to create, if you can get the complete state of the repo for two commits.
@@florianfanderl6674 yeah, it is. But if you keep the complete separate versions of the files it's trivial to run one of the many existing diff tools on them, so why not just do that, I guess.
Git keeps whole copy of the most recent file in its repository. But older versions are often recorded as diffs whenever it makes sense (e.g. text files). Then even older versions are packed into .pack files to make Git even more space efficient
I would expect a better explanation from computerphile. A talking man might not be as strong without enough diagrams, animations, demos. Cheers for the dude though, appreciate the explanation.
I agree, his explanation is pretty rough. There is a Russian guy who does a fantastic job explaining how exactly git works under the hood in details on TH-cam: 3FKrszHcIsA
5:22 You can set your working directory to any subdirectory of a project directory. Git will automatically go up into parent directories until it finds one containing a .git subdirectory, or until it hits a filesystem boundary, at which point it gives up.
No need for a big sluggish relational database. The filesystem is the database. Fast, consistent, works on every system and OS. No overengineering here 👍
Filesystems are not fast for complex operations. Git does relatively simple operations like read/write, but databases are optimized for indexing, processing, storing and querying data relatively often. Git is slow in relation to databases for those purposes, but works really well for what it _is_ built for - version control.
Hi, this isn't meant to sound like a complaint, but a suggestion for an improvement. I noticed that the camera is shaking a little bit throughout the interview. It kinda bothers me, but nonetheless, I still enjoyed this video very much!
*> Git's managing all this stuff, it's probably got kind of root access or something out there. How does that work in terms of security?* Git does _not_ have root access to anything. It runs under the same permissions as the user that's using it, so, if your user name is `computerphile`, then Git will run under the same level of access as that user, in the local directory. Git is not a user, and using the `git` command is no different than using the `ls` command or any other external utility. The corollary of this is that _if_ the `root` user is running Git commands as `root`, then Git will be running with `root` privileges. In practice, this means that the owners and access permissions of `.git/` and the files under it will be owned by `root`.
In my experience it's not too complex. Make a feature branch, lots of commits while you build or fix the thing, optionally fix up your branch history, make a pull request on GitHub, merge to mainline. (Or squash-merge to end up with a single bug or feature commit.) Chat software or voice call to talk about conflicts.
Intern: Everythiong the light touches? What about the .git directory? Me: That is beyond our boarders. You must never go there. Intern: But I thought a develop can do whatever they want.
The git folder internals is something I use as part of explaining how easy and useful git is, and why git is so lightweight in branching/merging compared to alternatives like TFS. I can also use it to easily fix errors, like deleting/modifying branches quickly.
I use .gitignore to ignore files that everybody is likely to see, like build products. I use .git/info/exclude to ignore stuff that only matters to me, like my private “test” subfolder.
Previous hash is a bit like a blockchain. You can do clone between directories on the same computer. Or using plain ssh to your account on a server where you only have terminal access (no webserver etc).
@@__Brandon__ "a bit like", I didn't said it was. ;-) Yes, you can rewrite the git history, as you have access and can chain what is the last checked in node in the ledger for that branch. As that start (or latest check in) are stored in git with a link to that node, which links to previous versions. That is how you end up with different branches. There are a link to which hashed node they are located. So when you rebuild the chain, you also set the branches start to another node. Blockchains have different ways of get consensus what is the start, so you can't easily chain where the latest log are. At least that is my understanding of git inner working.
Does that mean that everytime i run git diff it uncompresses both commits to compare the file contents? Wouldnt that be slow for huge projects? I had thought it wld do it more efficiently with some data structure
It actually does have a more efficient structure. Each file is stored in a "reflog" which lets you efficiently extract the delta differences between any two revisions. The commit tells git which revisions to look up.
There is no git service on a client. The git process acts on a local git repository (which can be a clone of a remote repository) running at the local user. Therefore, it only has the same access to any local objects as allowed by that objects permissions and ownership. While it is possible to access a repo on a shared filesystem, this is not the normal process.
can you skilled guys show us which extensions cant be trusted in browsers (firefox) - I use uBlock Origin, User-Sgent-Switcher, LocalCND, CleaURLS, Containers, Lastpass and Grammarly - and NordVPN on top, but i fear the extensions may be reporting home
We would really like to see how conflict resolve is actually solved in say a team with 10 or 15 members, computerphile can u create a small video on that ??
I'm afraid this guy doesn't really know what he's talking about. 'logs' directory doesn't keep "git log". It keeps reflogs for each ref separately - HEAD, each branch under heads/, each branch under remotes/, etc. Reflog is a log of changes of a ref. Every time you make a commit (by "git commit", or amending, or cherry-picking, or rebasing), or reset the current head by checkout or reset command, a record gets added to the reflog. You can show the reflog contents by "git reflog" command. You can find all commits you ever made there, though by default it's set up to expire after 90 days. Object's hash refers to its plaintext contents, NOT to its compressed contents. You can use different containers and databases and compression algorithms for storing objects, and their hash will always be the same. objects/ subdirectories contain plaintext files, NOT compressed. Each file begins with a line containing the object type and size. Having the object size in the first line protects a bit against constructing a SHA1 collision (which involves appending a tail). Also, having the object type there makes sure objects of different types always have different hashes, even if their bit contents is identical. You can pack the objects, they also get compressed and also delta-encoded. The packs will go to objects/packs/ directory. "git commit" does NOT zip your files. All it does, it calculates the hash of the root tree of your index, and records that tree hash into your commit object. 'config' is not a folder, it's a file. 'index' is not a folder, it's a file. Since it's a local file, "20 people" will never try to add information there, causing need to merge. It's your own local file, independent of other people.
Especially since git's conflicts are inherently visual. They have arrows and such made out of equals signs and angle brackets, and lots of software will color-code the conflicts vs the normal contents. :)
Maybe the next video will visit the pack directory and show both its contents and the contents of the objects directory before and after running git gc --aggressive.
Loved the description but the camera movement was awful. I got through to the end but now feel nauseous. Please don't record if you've forgotten your tripod!
Merge conflicts are the bane of my existence. For reasons, just a few weeks ago, I ended up with 15 files filled with about 40 merge conflicts at work. It was an absolute nightmare to deal with. So painful.
@@TheGreatAtario It is annoying, but that is why many shops require merging the target (such as master) into the source (such as your feature branch) first in order to check for that, then merging the resulting source back to the target only after you've verified it works. It's an imperfect workaround, because it spaghettifies your merge history, but the alternatives that don't leave a mess in the commit history are often harder to enforce on the more junior developers.
@@manvesh97 Yep. Commit after basically every change, and push only one feature/bugfix per branch, and when starting a new branch always begin with an up-to-date branch that's pulled all the latest changes. That's usually how we work anyway; it's just this one client who wanted changes from a version about 20 minor versions ahead of the one they were using, but didn't want to actually update to that version.
In the section about accessing other resources, I wonder how git handles symbolic links (I assume a linux environment). Does anybody know what git actually records when it encounters a symbolic link? I'm especially interested in what happens when it encounters a symbolic link to a file outside the directory tree of the repository ("ln -s /some/other/folder/sensitive_data sensitive_data").
@@__Brandon__ Thanks, Brandon. I know what a sym link is and how it is implemented in Linux (and Unix before that). Since presumably git does not record individual inodes, I wonder what mechanism it uses to represent a symlink.
@@__Brandon__ : I appreciate your response. I fear I'm being unclear in my language. When a git transfer from one system to another occurs, there are only bits -- no directories, files, links, or anything else -- on the wire. When a git repo is cloned from one system to another, something special must be done with each symlink. The file systems of the origin and destination system of such a clone are likely to be entirely different. Suppose that a repo on system A has symlinks to some targets on system A that are outside the repo. Suppose that repo is cloned to system B. How is each symlink represented in the bits that cross the wire, and how is the representation expanded on system B. What happens on system B if a symlink target is not present on system B when the clone begins -- does git show a broken link? Does git, for example, use a text representation that literally expands back into an "ln" command during the clone? I ask because these are sometimes security leaks in Linux systems. I've encountered situations where an attempted symlink fails one way if the target doesn't exist at all and a different way if the target exists and has permissions that block access. The difference between those reveals information about otherwise-private directory structures. If git allows unrestricted use of symlinks, then I wonder if there ways to use its behavior while cloning to similarly leak information about a system. I suppose all this goes under the rubric of "injection attack", as in "sql injection attack" -- using carefully constructed abuses of one mechanism to reveal information about another. This was all triggered by the brief question about security concerns mentioned in the video (4:55).
@@thomasstambaugh5181 So, in addition to a symlink essentially just being a text file storing the target/destination, there's also the file "mode" which is what distinguishes it from a regular text file. Git also tracks that mode. When you clone/pull a repo, git creates the symlink even if the target/destination doesn't exist (creating a broken link). It's no different than if you manually do ln -s to a non-existing target/destination.
@@jessodum3103 : Got it, thanks. In the context of the very brief comment in the video about security, this is perhaps worth mentioning. The presence of broken links in a git clone reveals information about the directory structure outside of the directory tree that comprises the git repo. If those links work when they're not supposed to (such as in a misconfigured system), then those links ARE breaches. My point here is that some caution is advised when allowing git to commit symbolic links.
It’s actually a “directed acyclic graph”. A commit can have a single parent, multiple parents (merge commit) or even no parents at all (like the first commit in a repo). Trivia question: can there be more than one commit in a repo with no parents?
@@lawrencedoliveiro9104 yes, there's even a --root flag for git rebase, so you can create a new commit without parents if your rebase will modify the original one
6:50 thats not true either. Git doesn't care about which files are changed. You just can't push to non-existent things. The moment a branch's head is changed on the remote, your local branch head is not gonna be the same thing and you have to update it first. This has nothing to do with the content of the files changed, in fact it could be an empty commit too.
just an addition: if someone wants to change/add something new to these default cmds behaviour, they can take the benefit of the .hooks/ directory. i.e. prefix/postfixing commit message or not letting you to push, if you have specific strings in the commit like [WIP]
I like exploring hitherto unknown places, where no human has set foot, the darkest dimmest recesses of untouched teritory yet to be found, hidden and harbouring magical secrets - in short, *.git*
A hash of an object is run over: 1) a prefix line containing the object type (blob, tree, commit, tag) and size; 2) the object's contents, in plaintext. As you see, objects of different type will have different hash, even if their binary contents is identical. Also, having the object size in the prefix line guards a bit against constructing a SHA1 collision. This is also how the objects are stored under .git/objects/XX/ subdirectories. The objects can also be packed, compressed, delta-encoded. This doesn't change their hash, since it's always the plaintext hash.
No need for sudo or root privs. git is just a normal command like cat, grep, awk etc. Git commands manage a repository - which is nothing more than a collection files - you own, just like any other files. Everything in a .git/ directory belongs to your user the same as your project/code files. There’s nothing special about .git/ or its contents that requires any root privileges more than any other jpeg or text file.
Yeah most intelligent word I've heard Crypto is the new gold!! Due to the fall in the stock market, I don't think it's advisable holding, it would be more beneficial and yield more profit if you actually trade on cryptocurrency I've been trading since the dip, and I've made so much profit trading.
@@stateangel2300 Yeah, My first investment with Mr Christian Thomas Clinton he earned me profit of over $25,530 US dollars, and ever since thenhas been delivering
Amazing stuff (!) - exactly the same sort of functionality we had in 1984 when I started working at IBM on an OS with ~400 people working simultaneously on the code.
In those days, it was believed you had to lock files when they were checked out, because if two people made changes to the same file, the world would end or something.
@@lawrencedoliveiro9104 I’m pretty sure we had the ability to have multiple updates to a module. Each module had an owner who was responsible for resolving any overlapping updates.
@@lawrencedoliveiro9104 Tricky that - internal dev team product used in IBM Labs … citation is my memory! Recall that back in 1984 we also had a global network with 30,000 users, email, remote access, file transfer, and a Reddit like system for sharing random stuff. Folk like Mike Cowlishaw invented Rexx as a hobby. There used to be smart people in IBM Labs!
@@Richardincancale I’m sure there were smart researchers at IBM, but somehow their shipping products rarely reflected that -- they were clunky, resource-hungry and overcomplicated compared to, say, DEC and Unix systems.
The folder “.git/log” is not for the command “git log” but for the command “git reflog”.
timestamp 1:00
The command “git log” shows commit messages, which are stored in the “.git/objects” folder.
So it's entirely magic, got it.
exactly
Technical problems advanced enough looks like magic. Until you understand them.
Torvalds: it's Finnish for *#!@ing magic.
Theres many things magical this channel presents just few can fully grasp but this time its pure and simple to understand. Git is awesome
@@okinnivlek Wait Linus is behind git?
00:22 how to investigate .git folder
00:33 HEAD, etc
00:50 config folder
01:00 log folder
01:20 objects folder
01:33 what happens on git commit: (1) compress current files with zlib, generates names with sha1
01:48 3 types of objects: (1) commit, (2) folder view, (3) each file
02:00 show object type: git cat-file -t
02:44 show object content: git cat-file -p
03:06 show object tree: git cat-file -p
03:34 show file content: git cat file -p
04:01 summary: what is git object
04:15 naming in ./git/objects/ (optimization)
04:56 question: root access/security
06:09 question: how git handles conflicts
07:27 index folder: stores what's gonna be your next commit (staged)
This table of content was created using "Smart Bookmarks for TH-cam" chrome extension.
To learn about the details, get an old version of git (v1.0.0 or v1.3.0 which doesn't require an old version of openssl's BIGNUM).
All the commands were simple shell scripts that called simple c-programs that manipulated the index and created the blob, tree and commit objects. "git init" was git-init-db and it created the .git/ directory. "git add" was a shell script that used git-ls-files (a c-program) to get information about the current directory structure and used this information as input for git-update-index. "git commit" was a shell script that let you write a commit message and created the commit object and updated the HEAD reference.
All these scripts and programs were small and easy to understand.
As I recall, one reason why a lot of that simplicity had to go away was because shell scripts and subprocesses do not work well under Microsoft Windows.
I think this was about the time Mozilla were looking to move to one of these newfangled distributed VCS things (likely they were using Subversion before). But one important criterion was that it had to work well under Windows. Sadly, that ruled out Git (at the time). So they went with Mercurial instead.
BIGNUM?
Okay, what happened here?
thanks a lot for the information you both @zvpunry and @lawrence D'Oliveiro
4:55 NO! Git does not inherently have root access, it only has the permissions of the user which runs the git commands. If you're silly enough to use sudo with git then sure it'll have root access for each command you sudo, but generally it's just your permissions.
I imagine there may have been some confusion such as the folder containing the contents of the repository being described as a "root" folder.
Right, came to see if anyone commented this yet - git works with the same ACLs on the folder as the user that is executing the commands, and that's it. Git has the same permissions as your normal user account, no more, no less.
The fun begins when someone does commits or other write operations while a root, then goes back to it without root privileges
yeah i wanted to point this out too. it was kind of a silly question because most programs don’t deal with permissions security at all. they simple do what they must and the OS manages security by stopping them/you when insufficient permissions arise. then you can choose to elevate permissions or back off. there’s really no silver bullet here but git (and most programs) doesn’t care what user it is running as.
@@caedenw I suspect he was thinking git runs as a service - it does not. That's why all git commands start with 'git ' - you're literally executing git at that point in time.
Max was my lecturer last year and by far one of the best ones in UoN. Thanks you for being such an amazing professor!
git runs as the user and is not root unless you run it as root.
This is important. Most console commands run as the user who invoked them. Special commands can have their SID flag set, which changes who they run as, but most are gonna be as the user who invoked them.
@@faeranne Yep and this confuses Windows users 😜. Don't think Windows has an analog of this.
@@faeranne yep, and generally you want as few as possible commands in your system to have that sid bit set (which means they are going to run as their owner, instead of the user who runs them), even if they have a very specific usage, because it's really hard to ensure your software can't be used to do something unintended, so they can be an important source of security issues.
In Windows, the usual answer if some procedure gives you trouble is: “have you tried running it as Administrator?”
7:50 Some people might be accustomed to other VCSes which automatically assume that every change you have made to your working copy is to be included in the next commit (e.g. Subversion works this way). Git doesn’t do this -- it requires you to explicitly _add_ changes to the “index” to be included in the commit. This is because Git recognizes that you often make multiple sets of changes -- for example, while fixing one thing, you notice something else that needs fixing, that is quite unrelated. So when you actually get round to committing your changes, you can pick and choose which ones go into the same commit, and which ones to leave for another separate commit. And some things, like debug message lines, can be left out altogether.
Some Git client software will allow you to commit changes you haven't added to the index manually. Also, some Subversion client software will let you check in subsets of the changed files in your workspace.
My pet theory is that git really grew out of Linus's scripts for managing patch files. The git "index" is really a draft patch message.
@@AdrianColley The git index file keeps a complete representation (hashes) of your checked out tree and of the staged tree. To speed up diffing and status command, it also keeps timestamp information of all checked out files.
When you do "git add", git stores the contents of each file to be staged as a blob, and saves this blob hash in the index (there is an option to just record an intention to stage, without calculating the hash and storing the blob, but it's not normally used).
When you make a commit, git calculates hashes of all updated (sub)trees in the index and writes those trees as new tree objects. The root tree hash then goes to the commit.
By the way, you can see the whole history of Git development in its repo.
@@davidfrischknecht8261 The "original" Git command line client allows you to commit without adding to the index. Internally, it just makes a new temporary index file for that.
@@Alexagrigorieff do you mean doing a commit without an add? You can still do that today, just do 'git commit '
I love the computerphile videos. If you can make one on docker that would be really interesting. ❤️
The recording of a terminal at 0:47 is a bit misleading. There's no need to use sudo when accessing a file to which you have read access (assuming this is the case as the file is within the "pi" home directory) and the path specified ("/Documents/git_test/.git/HEAD") in incorrect, it should be "~/Documents/git_test/.git/HEAD".
Just a reflex action when I was adding some illustrative stuff - you probably guessed but that isn't Max's terminal -Sean
@@Computerphile no worries :) nice to see a Raspberry Pi being used for the demonstrations!
@@Computerphile You can use tab to autofill directories/files in the path and the "tree" command is helpful to show the folder structure of small-ish folders.
I was alarmed to see `sudo nano`. There's a technical name for that: a footgun. You should only use `sudo` when you're sure you need elevated privileges. In any case, `sudoedit` is a safer way to do `sudo $EDITOR` when it's needed.
He talks about sending changes between two folders via a server. I'd like to point out that git doesn't actually need/care-about servers; we can push/pull changes directly from one local folder to another if we like; we can make clones of a folder; and so on. I like to have a folder full of "pristine" repos which I pull and push commits to; whilst making and committing changes in separate "working copies" scattered around my computer as needed. We can also send changes back/forth to multiple servers; for example, I host copies of my git repos on my own server, but I also have copies on GitHub (which acts like a "mirror").
That sounds like an interesting way to keep the "official" commits clean and to the point, but you definitely "lose" or "hide" some of the development history, which can be just clutter depending on the quality of those internal committs, so maybe no real loss there.
Worktrees ought to simplify this workflow for you
I think it is quite common to clean up and simplify your private commit history before sending a bunch of patches to the central authorities for a project. For example, you probably don’t want them seeing every little typo correction or whitespace fixup as a separate commit.
@@lawrencedoliveiro9104 Someone tell that to my junior devs ;_;
@@ali_p_q7920 that is how you work with Git. You put some stuff in your local copy, clean up the history and push those to the server.
The thing with GIt is that every user have an own copy of the repository. And when they want, they send the differences to some other user that can merge them into his code base.
What GitLab/GitHub does is making room for storing those common disk places shared between users.
The best blockchain!
The only useful one
I also use Git for keeping backups of my personal documents and my wife's. Git is never intended for that purpose but it works actually great. Especially preventing overwriting newer documents with older versions which tends to happen if you do manual copying files to a backup server or drive.
If your documents are text files, it's perfect for git. But if it's doc, it will still work but will be extremely inefficient.
@@thexavier666 Only on terms of diffing, since it's not storing deltas, space-wose there's no difference.
@@AlexAegisOfficial Space wise there is a difference, since your git history also contains all older versions. Since it can't do deltas it contains copies of all older versions.
tbf word also has a version control system.
@@stt.9433 I guess that's per document and git manages a whole folder, multiple documents which is the most common usecase, you want related documents (or even all of them) in one place, plus the built in is probably garbage
8:06 I'd definitely be interested in watching a Computerphile video about how larger teams use Git and handle merge conflicts and such issues
up
Me too
But with someone else, this guy is not very clear and always keeps quite vague. Also the previous video was not great, git deserves much better!
@@mcol3 I agree
In short, private forks and pull requests. Combined with CI builds to validate that it works. We merge about 100 PRs a day at my current project and so far it works fine 🙂
3:41 Tip: You can do “git show --raw «commit»” to get a list of the files affected by that commit. You can similarly list the files affected by each entry in the log with “git log --raw”.
It also shows the file renames, which Git is trying to guess.
For those who don't know, in Git, a commit doesn't store information of file rename. A rename appears as a deleted file and an added file with the same or (not much) different contents.
thanks a lot
Tig does this aswell, on top of showing the differences between each commit.
It's actually pretty interesting that git always stores the whole file. I thought it would just store the diff of files and reconstruct the latest state by reapplying all changes. Instead they store the state of the tree. Learnt something new today. Thanks ❤
By the time git was created storage was cheap enough that I guess it made sense to optimize for speed over space.
@@JonasBergling I always thought that the diff is very central in git. But apparently it's very easy to create, if you can get the complete state of the repo for two commits.
@@florianfanderl6674 yeah, it is. But if you keep the complete separate versions of the files it's trivial to run one of the many existing diff tools on them, so why not just do that, I guess.
Git keeps whole copy of the most recent file in its repository. But older versions are often recorded as diffs whenever it makes sense (e.g. text files). Then even older versions are packed into .pack files to make Git even more space efficient
@@rjmaas ah OK. So they still use diffs. Maybe I need to read a bit more 😊
New guy at work got his 1st merge conflict today, and he got frustrated and left
Left the room or the job?
@@coompiler9029 i'll find out next monday haha
Great video!
I rarely catch a video this early!
I would expect a better explanation from computerphile. A talking man might not be as strong without enough diagrams, animations, demos. Cheers for the dude though, appreciate the explanation.
I agree, his explanation is pretty rough. There is a Russian guy who does a fantastic job explaining how exactly git works under the hood in details on TH-cam: 3FKrszHcIsA
@@huqiao thanks for sharing the video link as well (:
5:22 You can set your working directory to any subdirectory of a project directory. Git will automatically go up into parent directories until it finds one containing a .git subdirectory, or until it hits a filesystem boundary, at which point it gives up.
thanks for info (:
No need for a big sluggish relational database. The filesystem is the database. Fast, consistent, works on every system and OS. No overengineering here 👍
sure it's fast, but even for my personal bookkeeping purpose, i find it quite limiting.
Filesystems are not fast for complex operations. Git does relatively simple operations like read/write, but databases are optimized for indexing, processing, storing and querying data relatively often.
Git is slow in relation to databases for those purposes, but works really well for what it _is_ built for - version control.
A Frenchman tried to push a repo when someone else had already pushed a change to upstream.
«Merge alors !»
Marvelous video! As always good job Sean! I am loving Computerphile!
would love to see even more detailed videos on git plumbing and stuff
Please do more videos on git, I find it very interesting and educative!
can we all appreciate Linus Torvald's level of genius?
I wish I could tap into some of that genius.
And of guts when he gave the finger to Nvidia. Did I miss his reaction to Microsoft buying GitHub?
Well, OK, but let's not pretend that git's design is any evidence for that genius.
@@AdrianColley What about linux?
@@Manuite That's a much better example.
Good video. You can tell peoples age by them using the term "folder" while typing "change directory", I feel old !
Thanks for the insights into the stuff going on in the background!
Kevin Kühnert
Siamese twins. Obviously he got the brain while seperation.
😂
@@Phillip3223 I've got absolutely 0 patience for a political debate in the youtube comments but I politely disagree with you about the brain thing
@Computerphile It would be interesting to see a video on "patch theory", which has been used by Darcs (pre-git) and Pijul (more recent)
Hi, this isn't meant to sound like a complaint, but a suggestion for an improvement. I noticed that the camera is shaking a little bit throughout the interview. It kinda bothers me, but nonetheless, I still enjoyed this video very much!
Erm, those blurred email addresses are readable if you apply a bit of logic
An interesting video for sure. I would appreciate seeing a discussion about the .gitignore file. Thanks for the excellent video!
*> Git's managing all this stuff, it's probably got kind of root access or something out there. How does that work in terms of security?*
Git does _not_ have root access to anything. It runs under the same permissions as the user that's using it, so, if your user name is `computerphile`, then Git will run under the same level of access as that user, in the local directory. Git is not a user, and using the `git` command is no different than using the `ls` command or any other external utility. The corollary of this is that _if_ the `root` user is running Git commands as `root`, then Git will be running with `root` privileges. In practice, this means that the owners and access permissions of `.git/` and the files under it will be owned by `root`.
This video would be so much more helpful with visuals. I love your other content with animations and notepad diagrams.
if 20 people are changing the same file, either that file is far too massive or something's gone very wrong with team communication :D
The file could also be riddled with bugs or incomplete features, which would necessitate so many people working on it. :)
This was an awesome chat! Cheers 🍻
would love to see a video on how teams use git and github, with the push pull workflow and also conflicts
Google "gitflow" and you will learn how most teams, big or small use git.
@@Jonteponte71 thanks🙌
In my experience it's not too complex. Make a feature branch, lots of commits while you build or fix the thing, optionally fix up your branch history, make a pull request on GitHub, merge to mainline. (Or squash-merge to end up with a single bug or feature commit.) Chat software or voice call to talk about conflicts.
Never thought there was more to learn about version control despite my daily use of it for like the last 5 years of my life.. lol
Intern: Everythiong the light touches? What about the .git directory?
Me: That is beyond our boarders. You must never go there.
Intern: But I thought a develop can do whatever they want.
Brilliant video, very informative and concise :)
Merge conflicts are the curse of every software developer.
"testing 2" is my go-to commit message
Great video. I had never before thought about how git actually worked. This was truly revealing.
The book Pro Git should be mandatory reading for any one who wants to understand Git.
The git folder internals is something I use as part of explaining how easy and useful git is, and why git is so lightweight in branching/merging compared to alternatives like TFS. I can also use it to easily fix errors, like deleting/modifying branches quickly.
I use .gitignore to ignore files that everybody is likely to see, like build products. I use .git/info/exclude to ignore stuff that only matters to me, like my private “test” subfolder.
I don't think even Microsoft uses TFS anymore.
Previous hash is a bit like a blockchain.
You can do clone between directories on the same computer. Or using plain ssh to your account on a server where you only have terminal access (no webserver etc).
It's exactly a blockchain!
@@AdrianColley And it's the only useful (and valuable) application of a blockchain.
@@__Brandon__ "a bit like", I didn't said it was. ;-)
Yes, you can rewrite the git history, as you have access and can chain what is the last checked in node in the ledger for that branch. As that start (or latest check in) are stored in git with a link to that node, which links to previous versions. That is how you end up with different branches. There are a link to which hashed node they are located. So when you rebuild the chain, you also set the branches start to another node.
Blockchains have different ways of get consensus what is the start, so you can't easily chain where the latest log are.
At least that is my understanding of git inner working.
awesome and thank's for explantions
Does that mean that everytime i run git diff it uncompresses both commits to compare the file contents? Wouldnt that be slow for huge projects? I had thought it wld do it more efficiently with some data structure
It actually does have a more efficient structure. Each file is stored in a "reflog" which lets you efficiently extract the delta differences between any two revisions. The commit tells git which revisions to look up.
What illegal images? OH NO! You’ve got me now!
SHA1 has been broken by a Dutch institute past week, it have been considered insecure for at least 4 years, but now is it, it's has been broken.
There is no git service on a client.
The git process acts on a local git repository (which can be a clone of a remote repository) running at the local user.
Therefore, it only has the same access to any local objects as allowed by that objects permissions and ownership.
While it is possible to access a repo on a shared filesystem, this is not the normal process.
can you skilled guys show us which extensions cant be trusted in browsers (firefox) - I use uBlock Origin, User-Sgent-Switcher, LocalCND, CleaURLS, Containers, Lastpass and Grammarly - and NordVPN on top, but i fear the extensions may be reporting home
We would really like to see how conflict resolve is actually solved in say a team with 10 or 15 members, computerphile can u create a small video on that ??
I am triggered every time he says folder instead of directory
I'm afraid this guy doesn't really know what he's talking about.
'logs' directory doesn't keep "git log". It keeps reflogs for each ref separately - HEAD, each branch under heads/, each branch under remotes/, etc. Reflog is a log of changes of a ref. Every time you make a commit (by "git commit", or amending, or cherry-picking, or rebasing), or reset the current head by checkout or reset command, a record gets added to the reflog. You can show the reflog contents by "git reflog" command. You can find all commits you ever made there, though by default it's set up to expire after 90 days.
Object's hash refers to its plaintext contents, NOT to its compressed contents. You can use different containers and databases and compression algorithms for storing objects, and their hash will always be the same.
objects/ subdirectories contain plaintext files, NOT compressed. Each file begins with a line containing the object type and size. Having the object size in the first line protects a bit against constructing a SHA1 collision (which involves appending a tail). Also, having the object type there makes sure objects of different types always have different hashes, even if their bit contents is identical.
You can pack the objects, they also get compressed and also delta-encoded. The packs will go to objects/packs/ directory.
"git commit" does NOT zip your files. All it does, it calculates the hash of the root tree of your index, and records that tree hash into your commit object.
'config' is not a folder, it's a file.
'index' is not a folder, it's a file. Since it's a local file, "20 people" will never try to add information there, causing need to merge. It's your own local file, independent of other people.
This video could have used a lot more visuals/ examples. It's weird to hear someone talking about merge conflicts, without showing any examples.
Especially since git's conflicts are inherently visual. They have arrows and such made out of equals signs and angle brackets, and lots of software will color-code the conflicts vs the normal contents. :)
Maybe the next video will visit the pack directory and show both its contents and the contents of the objects directory before and after running git gc --aggressive.
Ive come across the gitnoire and README files while git init. What are they used for?
Any suggestions on a comprehensive course to learn Git?
Loved the description but the camera movement was awful. I got through to the end but now feel nauseous. Please don't record if you've forgotten your tripod!
So does every commit have it's own unique tree, or are they all referring the same global one ?
every commit has its own global tree ,
global tree get changed after every commit
Merge conflicts are the bane of my existence. For reasons, just a few weeks ago, I ended up with 15 files filled with about 40 merge conflicts at work. It was an absolute nightmare to deal with. So painful.
But even worse than that are changes that aren't conflicts, but which mutually break one another at runtime after they've been auto-merged.
this happened to me at work too, what's the solution to avoiding this? smaller more incremental commits and more regular merges perhaps?
@@TheGreatAtario It is annoying, but that is why many shops require merging the target (such as master) into the source (such as your feature branch) first in order to check for that, then merging the resulting source back to the target only after you've verified it works. It's an imperfect workaround, because it spaghettifies your merge history, but the alternatives that don't leave a mess in the commit history are often harder to enforce on the more junior developers.
@@manvesh97 Yep. Commit after basically every change, and push only one feature/bugfix per branch, and when starting a new branch always begin with an up-to-date branch that's pulled all the latest changes. That's usually how we work anyway; it's just this one client who wanted changes from a version about 20 minor versions ahead of the one they were using, but didn't want to actually update to that version.
@@IceMetalPunk No commit strategy survives in first contact with a customer.
- General Patton.
Favourite spell: Conflictus resolvio!
Saying "minus" when reading command line flags is just going to confuse people. You're adding flags. He even says "dash t" at one point.
In the section about accessing other resources, I wonder how git handles symbolic links (I assume a linux environment).
Does anybody know what git actually records when it encounters a symbolic link? I'm especially interested in what happens when it encounters a symbolic link to a file outside the directory tree of the repository ("ln -s /some/other/folder/sensitive_data sensitive_data").
@@__Brandon__ Thanks, Brandon. I know what a sym link is and how it is implemented in Linux (and Unix before that). Since presumably git does not record individual inodes, I wonder what mechanism it uses to represent a symlink.
@@__Brandon__ : I appreciate your response. I fear I'm being unclear in my language. When a git transfer from one system to another occurs, there are only bits -- no directories, files, links, or anything else -- on the wire. When a git repo is cloned from one system to another, something special must be done with each symlink. The file systems of the origin and destination system of such a clone are likely to be entirely different. Suppose that a repo on system A has symlinks to some targets on system A that are outside the repo. Suppose that repo is cloned to system B. How is each symlink represented in the bits that cross the wire, and how is the representation expanded on system B. What happens on system B if a symlink target is not present on system B when the clone begins -- does git show a broken link?
Does git, for example, use a text representation that literally expands back into an "ln" command during the clone? I ask because these are sometimes security leaks in Linux systems. I've encountered situations where an attempted symlink fails one way if the target doesn't exist at all and a different way if the target exists and has permissions that block access. The difference between those reveals information about otherwise-private directory structures. If git allows unrestricted use of symlinks, then I wonder if there ways to use its behavior while cloning to similarly leak information about a system.
I suppose all this goes under the rubric of "injection attack", as in "sql injection attack" -- using carefully constructed abuses of one mechanism to reveal information about another.
This was all triggered by the brief question about security concerns mentioned in the video (4:55).
@@thomasstambaugh5181 So, in addition to a symlink essentially just being a text file storing the target/destination, there's also the file "mode" which is what distinguishes it from a regular text file. Git also tracks that mode. When you clone/pull a repo, git creates the symlink even if the target/destination doesn't exist (creating a broken link).
It's no different than if you manually do ln -s to a non-existing target/destination.
@@jessodum3103 : Got it, thanks. In the context of the very brief comment in the video about security, this is perhaps worth mentioning. The presence of broken links in a git clone reveals information about the directory structure outside of the directory tree that comprises the git repo. If those links work when they're not supposed to (such as in a misconfigured system), then those links ARE breaches.
My point here is that some caution is advised when allowing git to commit symbolic links.
sudo nano HEAD to see the content? just cat the file
Im confused as to why this is a video but ill watch it anyways
Git some
This confirms that git is magic.
Im going to assume that git uses a linked list data structure. The word HEAD seems like a giveaway.
Yup.
At 3:48 you see that the commit (except for the initial one) has info about its parent.
It’s actually a “directed acyclic graph”. A commit can have a single parent, multiple parents (merge commit) or even no parents at all (like the first commit in a repo).
Trivia question: can there be more than one commit in a repo with no parents?
@@lawrencedoliveiro9104 yes, there's even a --root flag for git rebase, so you can create a new commit without parents if your rebase will modify the original one
@@lawrencedoliveiro9104 git checkout --orphan
Git manipulated hard links, which a regular user can't normally do himself in the shell (ln -s)
6:50 thats not true either. Git doesn't care about which files are changed. You just can't push to non-existent things. The moment a branch's head is changed on the remote, your local branch head is not gonna be the same thing and you have to update it first. This has nothing to do with the content of the files changed, in fact it could be an empty commit too.
My lecturers at uni didnt even teach us git... we had a group project we were sharing around on a usb....
Now I finally git it.
Please do a whole video on teams using git.
This is the least informative Computerphile video I've seen.. cool, you've showed me some file structures, but didn't tell me how it actually works..
just an addition: if someone wants to change/add something new to these default cmds behaviour, they can take the benefit of the .hooks/ directory. i.e. prefix/postfixing commit message or not letting you to push, if you have specific strings in the commit like [WIP]
Ist das Kevin Kühnert?
The blur on the email is not sufficient. It's still readable.
i guess it's not actually a secret though, so maybe this doesn't matter
What version control do the git developers use?
git
perforce
I like exploring hitherto unknown places, where no human has set foot, the darkest dimmest recesses of untouched teritory yet to be found, hidden and harbouring magical secrets -
in short, *.git*
What's the hash based on btw. On the extracted files or the compressed?
A hash of an object is run over: 1) a prefix line containing the object type (blob, tree, commit, tag) and size; 2) the object's contents, in plaintext. As you see, objects of different type will have different hash, even if their binary contents is identical. Also, having the object size in the prefix line guards a bit against constructing a SHA1 collision.
This is also how the objects are stored under .git/objects/XX/ subdirectories.
The objects can also be packed, compressed, delta-encoded. This doesn't change their hash, since it's always the plaintext hash.
Why there is staging area in git?
No need for sudo or root privs. git is just a normal command like cat, grep, awk etc. Git commands manage a repository - which is nothing more than a collection files - you own, just like any other files. Everything in a .git/ directory belongs to your user the same as your project/code files. There’s nothing special about .git/ or its contents that requires any root privileges more than any other jpeg or text file.
I was just teaching our interns about git and GitHub.
Investing in crypto now should be in every wise individuals list. In some months time you'll be ecstatic with the decision you made today.
Yeah most intelligent word I've heard Crypto is the new gold!! Due to the fall in the stock market, I don't think it's advisable holding, it would be more beneficial and yield more profit if you actually trade on cryptocurrency I've been trading since the dip, and I've made so much profit trading.
How does this whole process works? I'm interested in investing in crypto but still confused by the fluctuations in price🙁?
That won't bother you if you trade with a professional like Mr Christian Thomas Clinton
@@stateangel2300 Yeah, My first investment with Mr Christian Thomas Clinton he earned me profit of over $25,530 US dollars, and ever since thenhas been delivering
i should mention the cards at 8:16 have poorly balanced volumes compared to the bulk of the video
my man clears the terminal by spamming enter
Oh I git it now.
His favourite chocolate? Git cat.
next video: inside the node_modules folder
Amazing stuff (!) - exactly the same sort of functionality we had in 1984 when I started working at IBM on an OS with ~400 people working simultaneously on the code.
In those days, it was believed you had to lock files when they were checked out, because if two people made changes to the same file, the world would end or something.
@@lawrencedoliveiro9104 I’m pretty sure we had the ability to have multiple updates to a module. Each module had an owner who was responsible for resolving any overlapping updates.
I would have to say, [citation needed].
@@lawrencedoliveiro9104 Tricky that - internal dev team product used in IBM Labs … citation is my memory! Recall that back in 1984 we also had a global network with 30,000 users, email, remote access, file transfer, and a Reddit like system for sharing random stuff. Folk like Mike Cowlishaw invented Rexx as a hobby. There used to be smart people in IBM Labs!
@@Richardincancale I’m sure there were smart researchers at IBM, but somehow their shipping products rarely reflected that -- they were clunky, resource-hungry and overcomplicated compared to, say, DEC and Unix systems.
git Hooks next?
Oh mein Gott, ist das Kevin Kühnert?!
Noice!
don't even git me started...
Sorry but this video is not so well research and it is wrong is several ways.
config is a text file, not a folder.
Is that an ouraring?