Git Guide: Technical
Welcome to this (relatively) short explanation of the technical "behind-the-scenes" side of Git. Git is a Source Code Management (SCM) tool that is open source and a lot more scalable and than other alternatives, although it has a somewhat hard-to-understand interface. It was originally developed as a tool for the Linux kernel because managing that code ran out of hand, but then it turned out many people needed better source control, so now it is the most widely used SCM in the world.
Besides Git, there are tools like Github, Gitlab and Bitbucket, which are (online) interfaces for hosting Git repositories. In addition to hosting them, they often have many extra features like issues/bug lists, pull requests (PRs, also called merge requests), reviewing, task management boards and sometimes entire social media platforms. Note that (confusingly) some features that you use a lot (like PRs) are Github features and not Git features. With that out of the way, I'm going to get started on how Git works!
I will have some "Try yourself" sections for where you can try examples on your own Git repo and
"Git command" sections for explaining what a high-level command does (e.g. git commit
or
git reset
).
Contents:
Blobs, Trees and Hashes
Before thinking about commits and different versions of the code, just imagine there is only one version of your
code (the one you have in the directory of your project). If you have one file with source code, then Git will store
this source code in a "blob" (not the file, only the contents of the file). It will store a blob in the
directory .git/objects
; to identify a blob, it will compute a SHA-1 hash (which is not actually guaranteed
to be unique, but the chance is quite high) such as 7fd015...
etc. Usually it only displays the first 6
digits.
Of course your repo does not consist of one file with source code, so there are also "trees". A tree is kind of like
a directory, where each entry has a name such as src
or main.ts
, which in turn refers to
another object (a tree or blob). For a tree, Git also calculates a unique(-ish) hash.
Even if you have 10000 different commits of a file that are all only 1 character different, it will store each version of this file's content in a separate blob object, meaning you have 10000 copies that are almost identical, since each version will have a completely different hash. Fortunately, Git will do some magic compression to not take up 10000 times as much disk space, but conceptually you should think of them as different copies.
Try yourself: go to any git repository and then use the following command in a terminal:
$ tree /F .git/objects
# if Bash: ls -R .git/objects
You might see something like this:
├───00
│ b7741d13a2a8edf0b8aa457be25c1b05bb1de2
│
├───04
│ c5e626e6abe8006e052fa26b645344a89f6c36
│
├───07
│ 723145fc8d316195072cd20237428759c2e07e
│ a58ac8d78fd0adcbf6a80f8a23a1156d795f5a
etc.
Now just pick any entry, enter the first 6 digits including the ones in the directory (for instance, 00b774 or 04c5e6) and run these commands:
$ git cat-file -t 04c5e6
tree
$ git cat-file -p 04c5e6
100644 blob 3e354001a908625dd3fe5a7cd3fda1b283f2d953 README.md
040000 tree 90c6615d02ebf8c948a5fdd01b6bb3ca012f4f9f src
(etc.)
Some of them will give you trees, others will give blobs, others will give commits. Doing git cat-file
-p
will show you the data that is in a Git object, while using -t
shows if it is a blob, tree or
commit.
Commits
In my opinion, "commit" is not a very good name; "snapshot" probably describes better how it works. Each commit is a complete snapshot of everything in your project.
Implementation-wise, a commit is really just the hash of a single tree object, namely the tree object that describes the corresponding version of your project directory. It also has a bit of extra data, such as the commit message and author of the commit.
Finally, a commit also has a parent (except if it's the first commit ever, then it has no parent). This is the hash of the commit object that came before this commit.
In the .git folder you will also find a file called "HEAD". If you open it, it will show you a reference to some branch or to a commit (if it's a commit, you might see something like "Detached HEAD" in a Git client). HEAD refers to the commit that you are currently on.

Git command log: if you use git log --oneline
or you view the history of commits in
Github Desktop it essentially does a simple tree walk algorithm:
currentCommit = HEAD
print(currentCommit.hash, currentCommit.message)
while (currentCommit.parent exists) {
currentCommit = currentCommit.parent
print(currentCommit.hash, currentCommit.message)
}
Tags and Branches
A tag is essentially just a nice name for the hash of an object (usually a commit). A branch is the same thing.
The only difference is that when you use commands like git commit
or git reset
, git will
automatically update branches for you, while you have to update tags yourself.
Git stores tags in .git/refs/tags
, while it stores branches in .git/refs/heads
.
Try yourself: run the following in any git repo:
# Create branch or tag called "example":
$ git branch example # or git tag example
$ cat .git/heads/example # or .git/tags/example
# This is the hash of a commit:
b4aa4d1fb12f770e90d5fc27fea64e21e364368d
$ git cat-file -p b4aa4d # first 6 digits
tree 927c472a22118dcb365ace4232c0137d5484e833
parent e2b52132f4f122bf054bddb314b502b04bfc3806
author (SOME INFO)
committer (SOME INFO)
Message of last commit you made
So you can see that a branch/tag is really just an alias for the hash of a commit object.
Git command branch/tag: when you use git branch branch-name
, it essentially does the
following:
hash = HEAD.hash
createFile(".git/refs/heads/(branch-name)", content=hash)
HEAD = "refs/heads/(branch-name)"
The git tag tag-name
command does the exact same thing, except it does not do the last line. You can use
git branch --list
or git tag --list
to see the branches/tags you created.

Git command commit: when you use git commit
, it essentially inserts a node at the end
of the commit tree and sets the parent of this new commit to the last commit:
commit = new Commit()
commit.tree = currentTree.hash
commit.parent = HEAD.hash
storeGitObject(commit)
if (HEAD refers to some branch B) { # i.e. not in detached HEAD state
make B refer to commit.hash
make HEAD refer to B
} else {
make HEAD refer to commit.hash
}

Git command checkout: when you use git checkout branch-name
or git switch
branch-name
or you switch to a different branch in Github Desktop, all it does is HEAD =
"refs/heads/branch-name"
(and in case you had some changes, it will move these over to the new branch).
Additionally, you can do git checkout aa1e3a
(or with any other hash) to make the HEAD refer to that
commit. Git or a Git client will then tell you that you are in "detached HEAD state", meaning that your HEAD does not
refer to a branch, but to a "loose" commit.
Git command reset: you can instead use git reset
to move back to another commit (I
don't think there is an equivalent for this in Github Desktop). Instead of moving HEAD to another commit, this moves
the branch that you are on to another commit (assuming your HEAD is not detached). For instance, you can use
git reset --hard HEAD~2
to move the branch you are on back to the grandparent (~2) of the HEAD. (Using
--soft instead of --hard will keep the changes of the commits that you passed in your project folder, while --hard
deletes them entirely).
Rewriting History
Now finally, there are some cool things that you can do with Git which will surely be useful at some point. Note
that when you rewrite history, there will be a strong difference between what you have in your local clone of the repo
and the remote repository (assuming you are working with a remote), and you have to run git push -f
instead of just git push
. This means that if other people are working on the same branch as you, it will
mess up their history. Therefore I strongly suggest you never force-push to the main branch unless you are
working on your own.

Git command commit --amend: you can use git commit --amend
to not create a new commit, but to modify
the current (HEAD) commit instead. When you have added your changes with git add
or a client like Github
Desktop, you can use this flag to rewrite history. It essentially does the following:
commit = new Commit()
commit.tree = currentTree.hash
commit.parent = HEAD.parent.hash # NOTE only difference: HEAD.parent
storeGitObject(commit)
if (HEAD refers to some branch B) { # i.e. not in detached HEAD state
make B refer to commit.hash
make HEAD refer to B
} else {
make HEAD refer to commit.hash
}

Git command rebase: imagine you are working on two different branches, main
and
feature1
. The Git tree might look likes this, where each node is a commit:

Assuming your HEAD refers to feature1, if you use git rebase main
(or use Github Desktop for this), the
graph will change as follows (it finds the least common ancestor, i.e. the latest commit that both branches originate
from):


Now that you have rebased feature1
onto main
, it is really easy to make a pull request for
the feature1
branch; after all, git can simply do the following:
main = feature1
So that after merging, the history looks like this:

Git command rebase -i: There is also something called "interactive rebase", which also rewrites
history, but its name is misleading in the sense that it does not necessarily change the base of the current
branch. You can also just see it as a general history rewriting tool. When you do something like git rebase -i
HEAD~4
, it will prompt you what you would like to do with the last 4 commits (e.g. "squash", "drop", "reword",
"edit" and others [2]). When you close the text
editor, it will then go over each commit that is changed and prompt you again. If you are finished with editing, you
can then use git rebase --continue
to continue, or you can go to Github Desktop and click on the Continue
button. You can always do git rebase --abort
if it turns out you messed something up.
Finally, there are other history rewriting operations you can do, like git cherry-pick
, but you should
refer to the Git documentation for this [1].
Remotes
Now finally, for a completely different topic: if you work with Github, the repository is hosted somewhere else, because often you want more than one person working on the same repository. When you have another repository somewhere else, it is called a remote. Usually this remote is called "origin" and it just points to the URL of the repo hosted on Github/Gitlab/etc.
Another feature of branches is that they can "track" a branch on the remote. So for instance, main
will
probably track origin/main
.

Git command pull: when you use git pull origin
, it then does as follows:
# git fetch simply gets all git objects from the remote that didn't exist locally yet
git fetch
ancestor = leastCommonAncestor(main, origin/main)
if (ancestor == main) { # if origin/main is directly "ahead" by >= 0 commits
main = origin/main
} else { # if branches diverge
error("main and origin/main diverge by X and Y commits respectively")
}
Thus it will "fast-forward" the local branch to the remote branch, but only if origin/main is some child of main (i.e. main has strictly fewer commits than the branch it has tracking).

Git command push: in contrast, git push origin
kind of does the reverse: first it will
transfer git objects (blobs, trees, commits) that do not already exist on the remote; after that, if the remote branch
main is some child of origin/main (transitively, i.e. main has strictly more commits than the branch it is
tracking), it will set origin/main to point to the same commit as your local branch main.
See here for commands for changing properties of the remote [3].
Example Workflow
This is one example of how you can collaborate using Git. Typically there will be one main
branch and
every collaborator can branch off of that one. That means to start a new feature, you do as follows:
- Create new branch and checkout to it:
git checkout -b new-feature
- Write the code for a commit
- Use
git add <filenames>
to select changes for your commit and then usegit commit -m "message"
- Use
git push
to push your changes to the remote - Repeat from 2. until done
Then regularly, you should do the following (because if you do not, you will diverge from the main branch and get a lot of merge conflicts when you issue a PR):
git checkout main
(if you have uncommitted changes, you can usegit stash push
)git pull
to get the newest version ofmain
git checkout new-feature
git rebase main
to base HEAD (i.e. new-feature) off ofmain
; when it shows merge conflicts, you should fix those, usegit add name-of-file.ext
to mark that you are done and finally usegit rebase --continue
to keep going- When rebasing is finished and you are satisfied with the rebase (i.e. your code still works as intended), your
local branch is now up-to-date with the latest version of
main
, but the remote branch is not; if nobody else was working on the same branch as you, you can usegit push -f
to update the remote as well
Then when you are done, you can create a PR and you should have no merge conflicts.