Git Branches: Intuition and Reality

While the explanation is right in some sense, it misses a few points.

Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).

The actual chain of commits that represent what we see as branch comes from the commits themselves. Those commits point back to their parent commit.

And then one can see why no branch has any special meaning: It is a chain of related commits with a named entrypoint. Once you delete a branch (i.e. the named wandering pointer to a commit), you cannot identify a branch as such anymore. It is just a chain of related commits without a named label now. And nothing besides the name distinguished the branch from other commit chains before.

The master/dev/release branches are then a convention to keep an updated commit pointer on the chain of commits containing changes of interest.

For years I was deeply annoyed by the terrible name “branch” for something that acts more like a bookmark (or “wandering tag” indeed!).

And then I learned that git branches are branches in exactly the same way that the first element of a linked list in C “is” the linked list. Git was made by C people and they’re used to referring to entire data structures by way of some root element.

I mean that doesn’t make me dislike the name any less but at least now I see where they were coming from.

When the entire structure of commits is called a tree, I find the name "branch" fitting. The branch is identified by its head commit, so the path from head to root is uniquely defined and that's the branch. (Disregarding merges for now.)

Disregarding merges for now.

Without disregarding them, it's not a tree, but a DAG.

It is a tree. What makes you think it's just a DAG? Are there commits with multiple parent commits or what?

There absolutely can be. Merge commits have multiple parent commits for example. It's definitely a graph not just a tree.

Parent comment was about disregarding merge commits.

You can't disregard merge commits. It's part of the structure.

Yes. Merge commits have two parents.

Two or more. I'm not sure there's a limit.

Try not to do this (imagine a 5-way merge conflict).

Leaning into the tree metaphor (and following the precedent of other version control systems), git should have used the term trunk instead of master or main.

By following this metaphor, a trunk would be the original commit, whereas branches are the tree branch tips.

And consider that because every folder can have its own .git subfolder, we can have several paralle trunks/repositories at the same time, meaning we have a forest.

I haven't really seen specific guidelines as to when and why should anyone start a new repo. Are there some? Or is "mono-repo" the best solution, and when and for whom. Surely we need more than one git-repo in the whole world?!

Why? That would heavily imply that master/main is somehow technically different from all other branches (since a trunk is certainly not a branch), which to my knowledge is not true.

FWIW “tree” has a specific, different meaning in Git. It’s a file tracking the contents of a directory.

Well yes if you disregard the thing that makes it not a tree then it is a tree. But you can't disregard that!

That's (most probably) where the "head" terminology comes from, too.

Yes you are correct. It traces back to Allen Newell

Git was made by C people

This is why I think of branches as pointers. The file contents are literally just a pointer to a commit on the DAG.

acts more like a bookmark

In fact, Mercurial uses the term “bookmark” for its lightweight, git-like branching. Mercurial’s branches have slightly different semantics and can’t be deleted like bookmarks or git branches

Git was made by C people and they’re used to referring to entire data structures by way of some root element.

FWIW this is actually backwards. The word "branch" was already in common use (to refer to the same basic idea) in SCM systems going back decades, and in almost all of those a "branch" was indeed a first class object with its own data that acted as a "container" for commits, both semantically and physically.

The fact that a "branch" is just a pointer is in fact a git innovation on top of the former idea.

I don't see the problem?

Real natural tree branches grow.

If you move the branch to point somewhere else, then it's better/more accurately said that you changed the name to refer to a different branch. We can think of branches as the chains of commits; it's the names we give them that 'wander' both as we commit and if we move them to a different branch. But merging (as it were!) the concepts of branches and names for their tips is convenient, and often equivalent/inconsequential.

This was the most useful piece of information that I have ever read about Git.

But what happens if you merge branch A into beanch B? A and B will both contain the commits of A, but in B there may be commits of B between the commits that were merged. Do the same commits of A then have different parents depending on which branch they are on?

Merging branch A into branch B does two things:

1. Create a new merge commit with two parents: the commit pointed to by A and the commit pointed to by B.

2. Set branch B to point at the new merge commit.

This is a non-linear history; when comparing some commits there isn't a "before" or "after."

If I checkout the merge commit and then do a ‘git log -n 5’, which parent pointer is followed to show the previous commit logs? A or B or Both / all if it is more than a 2-way merge?

By default it's by commit date.

Other ordering is possible, e.g. --topo-order

As I recall, it shows commits from both parents in order of the committed date.

To complete the answer, the only difference if you merge branch B into branch A is that A is advanced to the new commit instead.

I keep repeating this every time someone talks about git and finds something weird or doesn't get branches, so I'm really glad your parent mentioned it as well and I know there's someone else out there that "gets" that:

    In git it's all just labels/pointers

It's not useful at all to think about branches as the user sees them as "things" of their own. Branches don't "have" anything. Branches in that sense are just convenient labels.

Of course actual "branches" in the commit tree exist whether you label them or not. Until `git` does a garbage collection and gets rid of anything that doesn't have a pointer ultimately leading to it - something that a human would understand aka branch/tag. And that's why we call these labels "branches" as well but it's actually one word for two things here. The actual tree branch and the label that's called branch.

And a branch and a tag are basically the same exact thing underneath, just a file in the `.git` directory somewhere that contains a commit hash. All the meaning and differentiation of branch or tag is just in the human brain and how we and our tools treat them. Such as if you look at a particular commit in your tool of choice, it will tell you which branch it's part of. To create a branch you can literally just create a thousand randomly named files in the right part of the `.git` directory containing the same commit hash and suddenly this commit "is on all those branches". That's what git does and why creating a branch in git is so super fast.

To make things more complicated, the word "tag" is also overloaded. It can either be just a reference (in git lingo, a "ref") to a commit - just like a branch, only differing from it in how the tools treat it; but they can also be "annotated tags" which are pointing to a special tag object which contains some metadata and only then points to a specific commit (or other kind of object...) :)

You’ll also see the first type referred to as “lightweight tags”, if that helps anyone :)

Do the same commits of A then have different parents depending on which branch they are on?

Absolutely not. Commits are immutable (representing whole repo state, not a diff), and branches are just (mutable) pointers to them.

As the sibling already noted, a merge commit is just a regular commit. It simply points to multiple parents, "merging" them. Aside of the whole machinery to resolve conflicts etc. that's pretty much all there is to it.

When your graph topology allows it, you can also merge branches without generating a new commit (so called "fast forward" merges) - such a merge does nothing but rewrites the branch pointer. You can also create merge commits that point to more parents than two ("octopus" merges). Reconciling the commits' content can get quite complicated in such cases, but from the repo graph perspective it's nothing special.

Commits are immutable (representing whole repo state, not a diff)

To make things more clear: Repo state here is the contents of all files, and some metadata including a pointer to the previous commit.

So a commit hash uniquely identifies not only a set of files but the unique history leading up to it! That's why we some people like to call git the original block chain (there's no proof of work involved of course so it can never be used for payments or anything like that, but the merkle tree bit is similar enough).

In short: merge commits have multiple parent commits. So your tree tracing logic bifurcates at that point. The commits in the merged history are not altered by the merge commit; they each have a single parent commit (unless they are also merge commits).

aka Git is the simplest crappiest implementation that can work. Then they tacked a terrible UX onto it and shipped it. Chaos ensued and we as devs have spent the last almost 20 years fighting over trying to understand the chaos, we would have been way better off staying with SVN or Mercurial or Fossil(or pretty much any other VCS), but that ship has sailed and now we are stuck in the chaos.

Now nobody understands their VCS and nobody will ever understand it, as the explanations are either too detailed and miss the forest for the trees or too high level and skip over all the trees that kill you when you accidentally run into them.

I hope someone somewhere manages to convince us all to move to something sane.

With all due respect, what a load of crap!

SVN, branching that takes forever instead of a simple file with a commit hash in it? Are you serious?

Mercurial, when I had to use it for almost 2 years? The single thing I missed the most is the fact that "everything is just a label" (or pointer if you will).

Fossil I can't comment on with certainty, as I never really used it.

This is probably gonna get voted into oblivion, but most devs just don't get VCS, period (yes I'm a dev, so this is an inside perspective). Git or not. They didn't get it in CVS days, they didn't get it in SVN days and they didn't get various commercial ones either. Somehow most devs just don't grok version control trees.

But so far git is the single best VCS I have got to use. Everything that I actually need day to day follows from grokking the one simple rule: All those branches and tags are just labels/pointers/sticky notes and you can move them around at will and it's fast.

I do get that there are tricky situations with octopus merges and all that jazz and the Linux kernel and a few other open source projects are probably some of the trickier use cases to understand. For your run of the mill corporate situation? Keep `master`/`main`/`whateveryoucallit` history straight with one commit per feature/change by doing rebases and squashing and you will never have a single minute of misunderstanding what is going on. IFF you grok that "everything is just a label" and you've actually "created a branch" by just creating a file in `.git/refs/heads/` yourself!

And the second rule is: before you try anything, just: commit! And never close that terminal window. You might need that commit hash to reattach a label to it after you "destroyed" your branch (but git has not garbage collected yet and it's all actually still there). Or someone else still has your commits and you just get them from there and reattach a label. It's really so simple.

Look, we have spent the last ~ 20 years across HN and conferences and what not trying to teach git to people with basically nothing to show for it. Most developers still can't do much more than occasionally commit stuff. They still rm -rf their tree whenever something goes wrong.

Your entire hate for SVN was branching takes a while(because it requires a round-trip to the server). SVN was easy to reason about, you didn't have entire conference talks trying to explain how SVN works, so people don't have to rm -rf their entire tree and re-check out every week.

And even you, who claim to understand git seem to not understand git reflog. I think that clearly sums up my argument perfectly fine :)

Git is NOT easy to reason about, and I've never seen a website or blog post or conference talk about git that wasn't factually inaccurate in some way, and yet they still continue to proliferate. If we don't come up with a better git, the next decade will still be spent trying to teach git to people that will never understand it.

never close that terminal window. You might need that commit hash to reattach a label to it after you "destroyed" your branch

That's where the reflog comes handy.

I used svn only a few years ago and don't remember branching taking very long. Maybe it was csv that had to copy the all files.

I do remember merges being horrible in svn. Just branching off trunk, doing some work and merging back is fine, but if you try to merge from trunk to your branch to "catch up" you're in for some pain when you later want to merge to trunk.

Also svn treats adding and deleting file as different from just editing, more so than git does.

A key point with git is that every clone is effectively its own set of branches; even if they have the same name. The mechanisms you use for synchronizing your local branches with some remote branches are exactly the same as the mechanisms you use between to your local named branches.

Git was actually designed initially for email based workflows where there was no central remote at all. Basically, that works by exporting patches and then applying them to your local branch. The branch name isn't even part of the patch.

A git patch is just a textualized form of the list of commits you created locally. You can apply them to any branch you like. As long as you and whomever applies the patch has a common ancestor commit in common, the patch may merge cleanly. It's good hygiene to ensure it does by for example rebasing/merging/squashing before you email somebody your patches. If that somebody is called Linus Torvalds, he's going to be pretty strict about things like commit messages and things not being spaghetti ball of merges, reverts, forks, etc. Your mess, your problem. Linux development still works via mailing list. And forget about emailing him directly with a patch; you need to use the mailing lists like everybody else. And he works with a network of senior contributors that screen everything that comes in and that aggregate all the patches coming from upstream. So, he only gets involved at the end of the process.

Of course the rest of us use network protocols to sync our repositories. But the important distinction here is that this is a two step process. First you fetch content from remote. This is simply ensuring you have all the commit objects you need in your local git database. Any branches you have are simply text files with the commit content hash they point to as the content in .git/refs/heads. Remote branches are the same but live in your local .git/refs/remotes/<remotename>. Those branches might be named something like origin/main to make it clear that that is a local branch from the origin remote. And then you rebase/merge between your local and "remote" (i.e. also local) branch as needed. Pull is just short hand for doing both steps in one go. All merges are local. Same with rebases.

Most of the conventions people project on git are kind of cultural and vary between people and companies. It's helpful to read up on the git internals in the Git book. Github is sort of an opinionated take on this that back in the day made people coming from centralized version systems like subversion feel at home by providing a central repository and allowing them to push their changes there or "share" branches there. Not necessarily a great idea for bigger projects and limiting write access is common on Github.

A git patch is just a textualized form of the list of commits you created locally. You can apply them to any branch you like.

Not even branch. You can combine two unrelated repositories, and in theory you could cherry-pick commits from one of the original repositories to the other.

Of course, in practice this rarely works because the files mentioned in the commit don't exist in the other repository. But there's nothing in git's mechanisms that stops this: it's just a bunch of commits, which are actually just a bunch of file contents.

The whole "diff" or "patch" concept in git is just way of doing data presentation - rather than showing you the actual commit, which is generally not helpful, it shows you the difference between the contents of the commit, and the same files in the state referenced by the previous commit.

Git commits always have a parent. Applying git patches to a repository without the parent is not going to work. The repository must have the parent commit. You might force it to work without that but it's going to create conflicts, complicate merging, etc.

The reason is that a git patch is not merely a diff but an export of the actual commit objects and referred content (trees, blob diffs, hashes, etc.). Applying the patch recreates the exact commit objects on the other side. The end state is exactly the same as if you would have merged the commits from some branch. There is no difference.

Git diff, shows you a normal diff. It's not the same thing. You can indeed apply such diffs to your local work copy. But that's not the same thing as a git patch.

The commands you need are git format-patch (exports the patch) and git apply (applies it).

Applying git patches to a repository without the parent is not going to work.

If that's the case (and I'm not saying it isn't - I really don't know), how does cherry-pick work?

I know I can cherry-pick commits in the same repo from entirely separate commit chains, where the only common ancestor is either my local HEAD or some other commit way below both of us.

Why would that not work for different repositories?

> The branch name isn't even part of the patch.

More generally, the branch name is not stored with commits; it has to be computed by walking back from the branch tips, and commits can have multiple possible branch names based on this. In other words, git does not preserve information about which branch was actually active when you made a particular commit. (Mercurial, by contrast, stores the branch that was active when a commit was made in the commit, so that information is preserved.) The article discusses some implications of this, although it doesn't phrase it quite the way I did above.

Branches are pointers to a commit and that pointer is refreshed when a new commit is created. One could say they are a wandering tag (without explaining a tag for now).

A good name for these "wandering tags" would be "heads", since it's what git calls them internally (for instance, when not packed, they're stored at the "refs/heads/" path in the repository). This also exposes a distinction between a "branch" and its "head", and that distinction can be useful.

Just to make this explicit: a branch in the chain of commits. The start of the chain is pointed to by a head. When you create a new commit on a branch, the head (of that branch) changes to point to the new commit.

Arrggh. A branch IS a chain of commits.

So would it be correct to say that 'heads' are the LEAVES of the commits-tree?

Or are there such leaves which are not considered 'heads'?

No. There are leaves that aren't heads (for example, after you delete a branch, the old commits just lie around until someone deliberately cleans them up), and there are heads that arent' leaves (for example if you branch a new feature branch from main. The feature branch brances off of main, so main is not a leaf anymore)

I think this is covered adequately (if less completely) in the "technically correct" definition section.

I cannot be the only one that gets away with only knowing:

  git pull
  git merge x
  git checkout [-b] foo
  git commit
  git push

protip: If you want to switch branch you can now use "git switch [-c] foo". If you want to restore files you can do "git restore .".

Basically you can stop using checkout.

*edit*: fixed switch branch creation parameter.

"THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE."

I will keep using them so I can keep using old software. How new is it? Does ubuntu or debian have it?

They have been introduced in git 2.23 released in august 2019.

And is in Ubuntu 22.04 and Debian 11.

I think perhaps you meant "git switch [-c] foo"

Correct thank you.

Until it's actually removed, muscle memory will keep me using checkout.

For me, git becomes unintelligible when there's a crazy train-track map of branching and merging.

So I usually do whatever I can to keep a single straight-line master history.

I branch off for a task, then after a while my branch isn't joined at the tip of master. So I rebase locally until it is. Then when the PR happens, master gets my changes added to the top, with no extra noise from merge commits.

Even if the local-rebase workflow is slightly more complicated, the payoff is a really clean history, making future reasoning about branches much easier. Not to mention merge conflicts are easier to solve when you rebase early & often.

That's because you've apparently used some command to look at that history, none of his commands ever show that

In my observation, the problem comes when there is a "merge main to the feature branch before merging"

This step is when newbies get confused because the diffs are the wrong way, and I've seen people often losing merge hunks from master when there are conflicts which can be disastrous (not a git issue, I've seen people do the same in SVN).

The proper solution IMO would be for git to have a "merge --reintegrate" which would do the opposite merge: take the main branch and merge the current feature branch to it... after success, you have a new feature branch.

This is why I also prefer rebase and cleaner history (but a common mistake here is to squash after PR approval... that should be done before, not after).

I think this is a pretty sensible approach. Git feels like one of those tools where you are given a lot of power but it's your responsibility to use it in a sensible way. A bit like with Excel or spreadsheets. You can do a lot but you can also make a big mess pretty quickly.

For me, git becomes unintelligible when there's a crazy train-track map of branching and merging.

Consider finding peace by inverting your perspective: if your organization’s development process (for whatever reason, many of them legitimate) involves a crazy train-track of features being developed in parallel, isn’t it great that you're all at least using something that can keep track of it?

For a small team that should be unnecessary.

Frequent rebasing when you’re on a side branch of development is smart, but doesn’t conflict with my point.

Also, really, who looks back into the depths of history? There’s a reason a lot of backup schemes rotate a set of tapes over 30 or even 14 days. For that reason I am not a fan of rewriting history for “clarity”: I consider it wasted effort.

For the same reason I don’t care about branches for explorations that turned out to go nowhere — just mark the head abandoned and stop worrying about it.

Possibly `git branch NEWBRANCHNAME` instead of `git checkout -b NEWBRANCHNAME`. When I need to show git to someone in order for them to contribute to something, I give them only these incantations --- and instructions to ask me if weird git things happen.

It's `git switch [-c]` nowadays.

I've given up trying to configure git push the branch I'm on so I type this little dance each time :

    git switch -c foo
    git push 
    > did you mean git push --args-with-branch-name?
    sigh, copy, paste, enter

You then need to do both `git branch xxx` and `git checkout xxx` though.

If you teach "checkout to move around and add -b when moving to a new branch the first time" that works pretty well

The most important command:

    git reset --hard

This is my favorite. It allows one to easily create "service" branches based on tags where you apply to that tag a select set of commits from the development branch that you can then easily deploy to PROD without including the rest of the (perhaps not sufficiently tested) commits and without having a convoluted branching strategy.

Or:

  git reset --soft HEAD~1

Unless you are superhuman and never make mistaeks, you should put "git rebase" (with and without -i) on top of your list of things to learn.

Let us also hope they do a git diff before pushing changes.

Really?? I would have a very hard time without:

  git log
  git show
  git status
  git blame
  git diff
  git add
  git reset

My "minimalist" list of git commands:

    git add
    git blame
    git branch
    git checkout
    git cherry-pick
    git clone
    git commit
    git diff
    git fetch
    git log
    git merge
    git pull
    git push
    git reset
    git rm
    git stash
    git status

17 commands in total. I don't think it is possible to be a professional software engineer without being familiar with them. Granted, some of these you may need more frequently than others.

My guess is, the OC relies on their IDE/editor for the functionality provided by some of these commands. But then, why not just go all the way. Just use all VC features provided by your IDE, and claim that you need zero git commands.

Git is conceptually simple but has a baroque UI. I really recommend spending 15 minutes to understand it conceptually:

objects: blobs, trees, commits, annotated tags

refs: branches (local, remote-tracking), tags, HEAD

other: working tree, index (aka cache aka staging area), remotes

There's lots of good guides out there: git from the bottom up, git for computer scientists, and the git parable are some that spring to mind. The Git Book is also excellent but it's more than 15 minutes of your time.

All the commands become way less mystifying when you understand what they are manipulating. You'll never get into a state where you want to `rm -rf` the entire repo and start with a new clone. There will hopefully be no more teeth grinding or keyboard mashing.

I've used a lot of VCSs over the years (rcs, sccs, cvs, subversion, clearcase, mercurial, git) and I swear git is the one I find least frustrating. The others may have had simpler interfaces, but they were either conceptually more complex, overly rigid in their design and behavior, or both (looking at you clearcase).

Look, I get it: a lot of folks see git as a necessary evil that's part of their day job. I disagree. I think it's really worth spending the time to learn well, probably like you invested some time in your editor and other tooling.

I mean, it's easier than C++. :-)

The others may have had simpler interfaces, but they were either conceptually more complex, overly rigid in their design and behavior, or both

Most of the time, I'll humbly trade "overly rigid in their design and behavior" for "simpler interfaces". Please master VCS, discipline me !

`git add` sometimes

You don't need to know many more commands than those - maybe aside of "reset", "rebase" and "fetch" which definitely come handy regularly, and maybe a few more for showing status or browsing commit graph (unless you use some GUI for that) - as whenever you need anything else you'll usually just look it up anyway, either in the man or on the Web. However, if your mental model of git is limited to these commands only, you're doing yourself a disservice that leads to https://xkcd.com/1597/

you can't make do with only that if you're in a team of 2 even

I'm nothing without git rebase -i.

I know a decent amount of git, but day to day I use GUIs (Sublime Merge).

Because your IDE shows line-based blame and allows to check out old file versions via right-click? :) Mee too

You’re definitely not the only one, but you will level up if you learn branching. After that you can become the guru amongst your peers and a god among men if you learn how to use ‘git reflog’. ;) That will help you learn how fix almost any git accident.

you're def not the only one. but learning more pays dividends in confidence and resilience. fyi, pull is just shorthand for fetch && merge

I manage with even less, any merging under my watch happens as a strategy with pulling

If you get errors, save your work elsewhere, delete the project, and download a fresh copy.

Yes, this is very good. Also, rebasing is evil.

that's my list too, pretty much (+ the obvious `git add` and `git status` others have mentioned)

I throw in a `git reflog` once in a while, and the other day I patted myself on the back for my first use of `git tag -a mytag -m "This is my first tag!"` followed by `git push origin mytag`. I felt like god

now as much as I hate Xcode, its UI for looking at all my current changes and staging them by blocks of line for commit is like a superpower. as a solo developer working on brand new code, not all of my lines of thinking follow very atomic git commits, so it's nice to separate, say, some refactoring code from some actually new functionality when committing changes

FWIW that's 99% of my usage of git

(well... it's about 1% because I do everything using the eclipse git UI, but that's the same behavior you get from that commands)

I definitely can't work without `git add`. Other commands I use daily or almost-daily (frequently enough that I have aliases for them) are `git add -p`, `git commit --amend`, `git rebase`, `git rebase -i`, `git stash`...

Then there's of course `git log`, `git diff`, and `git status`, but I presume you know about those as well.

To cleanup the history before asking for a review.

git rebase HEAD~2 -i

git commit —amend

You are probably missing "git add" but otherwise, it is fine... unless you fuck up.

Unfucking thing is what of most of my knowledge of git goes to. Committing in the wrong branch, the wrong files, starting from the wrong commit, etc... Before you push, almost all mistakes are fixable, but it requires knowing a few more commands.

Then there are the project specific things. For instance, I worked on a project where we didn't have a central server we could push to and pull from (airgap). So we had to work with bundles. Git does that really well (it really is decentralized), but it is uncommon. Some people prefer a rebase-based workflow, some use cherry-picking extensively, some projects are more prone to conflicts than others, etc...

you don't need git add? no git status? i take my hat off to you, sir!

I don't use git at work, but in my private hobby projects my friends usually get mad when they watch me juggle changes and branch pointers with git reset --hard and git stash...

How do you undo a merge that you didn't mean to do/did wrongly?

    git reset --hard <last commit before merge>

Have some cosmetic fixups on your local branch that really should go into main (or a separate branch) first before merging a bigger feature?

    git stash
    git checkout main
    git stash apply

By thinking about branches as pointers, the commit graph existing independently, and stashes just being temporary commits, I feel I'm working much more directly with the underlying abstraction. Yes, git has commands for specific combinations of actions, but for an occasional user it's harder to remember every such command and which arguments and flags to pass in which order. It's either "look through documentation until you find graph diagrams illustrating what will happen for this order of arguments and flags" or "use the primitives 'move branch pointer', 'commit to branch', 'hold these changes for a second' for obtaining the commit tree you actually want. Knowing that the reflog exists also makes this insane-sounding working mode pretty non-scary. And yes, some operations (e.g. cherry-pick) you just need to do the "real" way.

(My git stash obsession is most likely just damage from years of using Perforce, which doesn't have a modified/staged distinction. The only way to commit only part of a changed file is via the equivalent of stash -> [restore half the file] -> commit -> stash pop.)

Prepares to be crucified...

git reset --hard is actually dangerous, because it throws away local modifications that were not yet committed. To undo just the commit and not the work, you should use git reset --soft (to undo just the git commit) or git reset --mixed (to undo both the git commit and the "git add"s leading up to the commit).

git checkout will also happily throw away local unstaged modifications, and I would argue that it is even more dangerous because I did not have to type "--hard" to shoot myself in the foot.

That does not sound right at all, I’m pretty sure there’s a warning when you try to checkout a branch that would override local unstaged changes. I might be wrong but I’d like some proof.

In some repo you have, make some changes to files that are tracked. Git status will show you "changes not staged for commit: <more stuff>". Run "git checkout .", and check the status command again, it will be in a clean state.

git status

    On branch feature/redacted
    Changes not staged for commit:
      (use "git add <file>..." to update what will be committed)
      (use "git restore <file>..." to discard changes in working directory)
 modified:   redacted.py

    no changes added to commit (use "git add" and/or "git commit -a")

>git checkout .

    Updated 1 path from the index

>git status

    On branch feature/redacted
    nothing to commit, working tree clean

git checkout is overloaded and has two use cases. git checkout $branch is safe, git checkout $file_or_folder_name will help you shoot yourself in the foot.

Checkout is also used for reverting changes to unstaged files.

There must be something you do terribly wrong if you believe it happened to you in normal use.

The poster is talking about when you do something like this: `git checkout .` That wipes out all unstaged changes on tracked files in the current directory

That's as unfortunate as 'rm *' (and good to be aware of BTW), but doing this deliberately doesn't qualify as "normal use" for me.

In my career, it has already happened more than once. And definitely not deliberately.

There are many ways this can happen accidentally. You could be trying to checkout a branch and tab complete your way into a folder name instead. You could be typing alt/esc + . with the intent of getting last argument of previous bash command, keyboard glitches and you end up with . instead.

Just because it hasn’t happened to you yet, you shouldn’t discount the experience of others. That’s like saying you have never messed up dd yet, so there are no problems with dd’s design.

That's the reason why it was replaced by two separate, more sensible commands: git switch for switching branches etc, which is safe, and the inherently dangerous git restore for reverting changes in your working directory.

It wasn't replaced, the other two commands were added to the cli.

It will take decades before git checkout will actually get replaced by git switch/restore in all the git books, tutorials and search results. Most normal users will keep learning and using git checkout in the meanwhile.

I understand why they can't actually deprecate existing commands. The git command is used in far too many existing shell scripts across thousands of companies.

I would argue that this is actually a fundamental deficiency of the "unix way" of doing things, where the same command is meant to be used both by human beings and in automated workflows. Automated workflows require backwards compatibility. Humans need easy to use interfaces that don't allow them to easily shoot themselves in the foot. The same tool cannot serve both needs.

That's not what checkout does.

In a repo with unstaged changes, running "git checkout ." will checkout files from a specific commit into the worktree and clobber your unstaged changes. By default it probably uses HEAD as the commit to checkout from.

You can checkout files and directories from completely unrelated commits and even unrelated repos!

I have used this to checkout specific useful files from other projects. It's a little nicer than just copying them in, because you can keep the repo tracking branch updated and keep checking out updates, and you can easily compare your file to theirs, and see what changes they have made to the file.

Both of those sound totally reasonable to me! I don't know of any better ways to do that stuff and there's nothing risky about it.

One thing that is risky about git reset --hard is that any non-committed changes are lost. That has bitten me a few times.

My controversial opinion is that git needs some kind of gui that help you keep track of the state of the repo

A very effective solution for that is a well-configured shell. IF you summarize the state of the repo in the prompt, it is always visible while typing a command.

Completely reasonable if you do on your local branch, or if you have a convention that remote branches starting with your name or something are yours only.

If you rewrite history on master… well… completely unreasonable.

Not a fan of the staging area, because it won't be tested. I would rather stash some changes to postpone them, then test and commit the workspace.

What does the staging area has to do with tests?

I do that too (except I use a soft checkout instead of —hard, I prefer to review and delete the reset changes myself).

I tried using git worktree for a while when working on multiple branches, but it’s a pain to use… Stashing is easier.

I'm working much more directly with the underlying abstraction

Your strategy of seeing things as they are is a useful general purpose life skill.

How do you undo a merge that you didn't mean to do/did wrongly?

I usually used `switch` for this:

    # Check out the previous commit
    git switch -d HEAD~
    # Overwrite the branch
    git switch -C <current branch>

You can check out multiple branches in different directories from a single git repo. This saves me a lot of what used to be stashing.

"Undo" is usually more like `git reset --hard HEAD@{1}`, ie. using the reflog.

Nothing wrong with this at all. Only people who don't understand and/or are scared of git don't like it.

You could also use cherry-pick to "donate" commits to other branches, instead of stash, of course. Magit has some great extra abstractions for this.

why crucified? you're doing exactly what I do. All the people who have any trouble with git whatsoever try to use it as a black box for some high-level whatever ideas of what is their workflow should be. And git is not that, git is a thin wrapper around simple and elegant data structure. If you understand it, then everything clicks and git doesn't EVER gives any trouble.

Your friends are unreasonable, unless you collaborate with them on the same branches and rewrite them after you shared them.

also, using stash is only a first level. git cherry-pick, git rebase --interactive, git reset --hard HEAD^ and friends allows do such moves and cosmetic extractions after the commit itself. I also prefer to split cosmetic changes and feature changes, so I extract cosmetic stuff to the main all the time.

I agree that git is almost asking you to juggle commits.

My preference is to use temporary branches and cherry-picking instead of stashing; I mostly use a gui* to work with git so it is easy to select the two or three commits to cherry-picking or see visually if an interactive rebase would work.

* https://gitextensions.github.io/

Nah I do that too. I also amend and push --force to my private branch a lot to make the git history easier to follow for whoever is going to do a code review.

A lot of things in git are just pointers to commits, and then the git implementation handles them under the covers in some way that usually makes sense but not always.

One example that also bites people: moving files isn't stored in git - if you move files (even with `git mv`) and create a new commit, the moves aren't stored, but this is reconstructed later by the client based on similarity, which comes from the diff algorithm.

And git has multiple diff algorithms to pick from: https://git-scm.com/docs/git-config#Documentation/git-config...

And optionally to not detect renames in diff output with `diff.renames`: https://git-scm.com/docs/git-config#Documentation/git-config...

moving files isn't stored in git

is there an intuitive and enlightening explanation as to why it is this way?

For the historical rationale see here: https://gist.github.com/borekb/3a548596ffd27ad6d948854751756...

In short, Linus stance is that file renaming doesn’t matter, only the contents of files matter, and the moving of contents between files. Moved/renamed files then fall out as a special case of moving content.

Personally, I think this is a case of the better being the enemy of the good, and his “clearly superior algorithm” doesn’t work as well as claimed in practice. Or maybe tooling merely still isn’t up to snuff after 18 years.

I don't think it's about having a stance, it's about git's architecture. From the commit graph point of view, there's no such things as moving anything at all, neither files nor content. Commits represent a whole new state of the repository, not a diff from the previous state. The only way a commit is linked to the previous state is via parent pointer, it can otherwise be completely unrelated (and you can simply change the parent pointer without changing anything else in the commit). Any diffs are calculated at runtime. The issue with renames is just a consequence of assuming such data model - you could try to plaster it over with some metadata, but ultimately you would still be fighting against the model rather than working with it.

Many people develop a bad mental model with commits as diffs, because that's what the UI makes them think commits are. It can work for a while, but inevitably leads to confusion later on.

As you say, commits link to their parent(s), and those links effectively represent the edges of the commit graph. It makes perfectly sense to record moves on those edges. That’s how other VCSs do it. There is no conflict with the commit model.

Viewing the commit graph in terms of nodes (commits) or edges (diffs) is equivalent, these are dual views you can easily convert between. The internal representation is independent from that. Some VCSs use a mix of diffs and full revisions internally. Even Git uses delta compression when packing objects.

What I meant is that git doesn't have any structure to represent an edge other than a simple pointer. Conceptually it wouldn't be a big change to add some, but the consequence of that is that everything in git revolves around nodes rather than edges, and whenever the concept of an edge is needed (such as in "cherry-pick") it's being calculated on fly.

If you think of it not as a "rename" (which would belong in the edge object if it existed) but rather as a "note: the file A in this tree was known as B in the parent tree" it would make perfect sense to store it in the child commit.

I don’t see where this would be causing any issues. There is a canonical place where to put edge metadata, namely in the child commit. And whenever you’re interested in move information, you have to process the respective child commit anyway.

Git stores snapshots and that’s it. The whole tree, not per-file.

As to why Linus doesn’t like storing file moves: https://public-inbox.org/git/Pine.LNX.4.58.0504150753440.721...

I'd be happy to argue why Linus is wrong here. Many things would be much easier if git recorded some more metadata in every commit: file moves, and branch moves, to start with.

Having some sort of notion of "parent branch" would be very useful for a number of common operations, and a "renamed file" without having to rely on client dependent heuristics too. Empty files trip people up all the time so a "create file" would fit in perfectly.

These concepts would also be a good basis for more user friendly clients. Other version control systems do this the surprise factor should be low.

People would get lazy and rename a file without telling Subversion they had done it, so it would write a “old file deleted, new file created from nothing” revision. Most of the merge conflict resolution machinery just couldn’t run without the missing guidance. Git infers someone probably renamed a file you edited or vice versa, which seems risky but works better in practice.

Man, he communicates like a dick all the time I guess.

He does argue in a borderline hysterical way on many occasions.

It's kind of funny to see Linus browbeaten other people into submission regardless of him being right or not, while claiming "I am always right".

A few counter points:

- `hg` has `cp`, and I believe both Meta and Google's internal systems have that; - git has `mv`, which was added later, but it is really janky and git would forget files are moved which I think it is because git doesn't try to track that, likely because of the philosophy here; - as for storing file moves - nobody said you *have* to use this information, but you can certainly use this information to help with things.

The whole thread is an interesting read though and I will try going through it someday - maybe doing that would change my mind.

Git doesn't store any individual changes: files moved, lines added, line deleted, etc.

It stores a commit graph, and a tree at each of those commits. (A lossless compression algorithm deduplicates information.)

There's no need for the author to be concerned with what diffing information gets incorporated into the commit. Diffs are up to the viewer of the commit history.

  git show --diff-algorithm=...

Yup. “Storing moves” is the kind of thing that might sound intuitively obvious but then gets gnarly and non-obvious when you think about it for five minutes. And so something that might be “obvious” to do then turns out to be so non-obvious—how to catch all file moves (intent) outside of simple identitical content cases, and how do you represent them internally?—that you realize that just using snapshots is really the best thing to do.

It’s completely trivial. The obvious and correct place is in the commit object just like author and date and such, since renaming is semantically part of the commit, not the tree:

  commit 0123456789abcdef0123456789abcdef01234567
  parent fedcba9876543210fedcba9876543210fedcba98
  author Nemo <nemo@example.invalid> 1234567890 +0000
  committer Nemo <nemo@example.invalid> 1234567890 +0000
  rename-from path1.old
  rename-to path1.new
  rename-from path2.old
  rename-to path2.new

  Commit message

And you don’t detect moves (because that’s madness), but require that people record them deliberately, just like every other VCS has done. There’s even git-mv already, it just skips a step that every other VCS’s equivalent command would do. (And technically this all works out because the index is a commit, so you can record the rename normally.)

Of course, all of this assumes that moving a file is a meaningful operation. Perhaps ideally (for most languages and systems) you’d track this in far smaller chunks, so that you can track changes to a function even when it alone was moved to a different file. But things like Git aren’t interested in those kinds of semantics, and work technically at the file level, more or less, so I think it should track renames because in practice straightforward renames are super common, but often also involve other changes that thwart rename detection. Years ago Linus explained why he didn’t like storing moves (someone else has linked it), but I’m largely not sold with his reasoning—the theory of the perfect has hindered the useful, and file renames are commonly meaningful in ways more than he said.

The problem with that scenario is that usually it doesn't support a real-world-scenario where you do a rename in the tool (like some IDE) and it doesn't do the corresponding git operation.

(yes, some IDE might have git integration, but personally I don't like my IDE messing with git, except read-only (annotate, diff))

That’s… nothing special. If you don’t have Git integration in your IDE, you already have to do something like `git mv` or a `git add` and `git rm`. Nothing has changed in this new hypothetical world.

It’s completely trivial.

Like I implicitly said: how to do it beyond the “simple identical content cases”?

But if the solution is for the user to explicitly order renames (i.e., this renamed Java class is a file move) then the solution is indeed simple.

I see the point that Linus was making that you may want to be able to see “function moves” and so on. But in practice I am very often interested in file moves since you can inspect the file history easily in Git—except when you hit some wall because someone renamed the file. Then you need to re-run the command with `--follow`. Contrast all of that with a function move... I almost never can summon the will to fish out the incantation (like a regex or a robust line range) which will give me the history of a function across intra- or inter-file moves and so on.

BitKeeper already did it.

I think this is the one thing I feel BitKeeper does better than Git. Git can get confused about where a file came from, for moves but especially for copies, and so the version history ends, even if you ask it to try and follow along. BitKeeper, on the other hand, keeps the moves and copies as part of the history, so you can always trace it through to the origin of the file, no matter how circuitous.

git log has --follow but unfortunately it only works when spefying a single file and not e.g. a whole directory.

My TL;DR; for git commits is that these are connected like a linked list but in reverse and has more pointers than just head/tail. I recommend having a look at Merkle trees. I don't understand git cli, but I can manipulate git commits, branches, tags etc well based on basic understanding using a good git UI.

This has resulted in a feature not in VCSs that do track renames: using matching lines, git blame can track changes across files that were combined in a commit, where others would record half the lines as being a rename from one file and the other half as new lines (if you even thought to do it like that when making the commit; more likely the whole file would be tracked as new).

in general, even if people’s intuition about a topic is technically incorrect in some ways, people usually have the intuition they do for very legitimate reasons!

This is worth an essay of its own.

I guess.

To me, the opposite is a more worthy essay: why, with all the power to customize our tech, do we create things that consistently work differently than people's intuition?

The fact that it "mostly jibes" feels like a footgun, not a feature.

I get that for some, "git just works! It made sense from day one" but in my limited experience, 0% of people I've worked with have said that.

Sure, we can all learn the tech. And expert techniques in any field often don't jibe with naive expectations. But for me and the folks I work with, the tech industry feels like it's gliding more towards inscrutible tools vs ease of use.

We've hit a stage where many rely on code completion bots and answer-supplying bots instead of being able to directly embrace our tech. I wish the tech was more approachable on its own, but perhaps this is the natural evolution of things.

As one of those people that thinks it's extremely intuitive, I have to wonder where the confused people are learning about git. The documentation on the site[0] is quite clear:

A branch in Git is simply a lightweight movable pointer to one of these commits. The default branch name in Git is master. As you start making commits, you’re given a master branch that points to the last commit you made. Every time you commit, the master branch pointer moves forward automatically.

It has multiple diagrams explaining how commits point to their content and their parents, and branches point to commits. The Pro Git content has been there for at least 10 years (it's what I learned from 10 years ago).

Maybe the problem is just that the Internet is full of blogs that have incorrect diagrams (like those in the OP) and bad explanations, despite the main website having great documentation!

[0] https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-N...

If Git was "extremely intuitive", and the documentation was "great", why would so many otherwise smart people keep writing blogs about it with incorrect diagrams?

What is your theory about why so many people are having difficulty creating a correct mental model about Git, and why so many people are writing incorrect blogs about it?

Like I sort of implied, my theory is people haven't read the docs on the official site (or the book that's on the site), and keep regurgitating bad information that they read on some blog or howto site. I don't know why they do this. I don't make these sites, so I don't know what motivates people who do, especially people who don't understand what they're writing about.

If you understand the basic design premise (commits are content-addressed immutable snapshots), the pointer stuff is kind of obvious. It has to work something like that for it to be able to be immutable if you want to be able to make branches/tags after the commit is created.

In part it is because git is hard to use, in part it is because mostly people learn git by oral tradition and often treat it like sorcery.

Often, people's intuition is wrong on very important ways, and something that works like they expect is sure to create footguns or just blow up by itself.

But I'm not sure git is a case of this. The DVCS that were created following people's intuitions were known to be slow and internally complex, but I have never heard about them failing. (And the slowness is obviously of a kind that can be optimized away.)

We just stuck with the worst UI ever devised in public for a VCS because of network effects.

I totally agree with it being "the worst UI ever devised". It's fine to use commits with parent pointers and branches as pointers to commits and all the other stuff internally. But there should be a UI wrapped around that that maps to operations that make sense for the purpose of working on a software project.

Not this:

git merge [-n] [--stat] [--no-commit] [--squash] [--[no-]edit] [--no-verify] [-s <strategy>] [-X <strategy-option>] [-S[<keyid>]] [--[no-]allow-unrelated-histories] [--[no-]rerere-autoupdate] [-m <msg>] [-F <file>] [--into-name <branch>] [<commit>… ]

You do have a point, but it's not a slam dunk. Intuition isn't some fixed thing but arises from personal experience. A lot of that is common to a culture, but there are different cultures and in any case, some truly personal aspects remain.

There needs to be a balance between creating new, more powerful intuitions, and meeting people at the intuitions they already have.

Case in point, Git's branching model is pretty intuitive when you understand how Linux kernel development works. Perhaps 0% of the people you've worked with have looked into that. That's fine. Different cultures...

Another example that may be worth studying is mathematics and the hard sciences. Learning those is a lot about learning powerful intuitions.

Yeah, few things are actually "intuitive". "Shared familiarity" is probably a better term.

That doesn’t seem like the opposite to me. It seems like the same thing. Rather than rejecting people’s intuition as ‘understandable but wrong’ why don’t we use it as the basis for a better solution?

This is the same reasoning that SQL gets criticized with. But the answer is simple.

Git (and sql) range from simple task to very complicated. Everyone likes to fantasize about making it easier but they’re only thinking about the fraction of functionality they use, rather than everything it currently does.

If someone could come up with a simpler solution they would, but they can’t because git can do extremely complicated things and is internally consistent. Most people underestimate that part

Partially because that's a much harder design challenge, especially for people with an unrelated skill set

because a) everyone's intuition is different, b) sometimes uneducated intuition is just wrong. On a surface level things looks good, but in some specific situation intuitive ways of doing things could be not consistent or don't have any solution at all. In this cases you just stuck with magic box of software which did something and you have no idea what and reach for backup.

Git is not like that. It is very-very simple. If you learn basics of it, your intuition will align with git's "intuition" too, and you can do crazy things with total peace of mind, without googling or looking into source code of git to see how they had to make something "intuitive" in some definition of the word.

People writing software fall into several categories based on the problem they're solving, the reason they're solving it and the audience of the solution.

I solve my own problems for my own reasons all the time and therefore other people's intuitions are immaterial in the process. It would just slow me down to think "how would other people use this" when I'm focused on some technical personal problem.

Commercial software developers solve problems with the clear purpose of selling the solution to others and where they know ahead of time roughly what their audience's intuitions are. This is why intuitive GUI applications exist - there are whole industries devoted to finding out what people expect, what lowers cognitive load etc. iOS and Android apps give you a good idea of what is possible with modern tech when the purposes are properly aligned.

The problem here is that git was expressly developed by Linus to solve his own problem in a way that made sense to him with no thought as to how other people would use it. There were no focus groups, early betas, feedback from users and so on. At best there has been slow fixes to the porcelain to fix the stuff that bothers the people who could make a PR to git. On the other hand there are also many front-end projects that attempt to align some other person's idea of how version control is supposed to work with the Git model.

Anyway - I am in the camp where I very seldom get confused about a git thing because the actual expressed model is really simple (in the way that x86 assembly is a "simpler" language than Java). I find most front-ends much more confusing because they don't seem to work the way I expect. But I am never surprised when someone's pet project is understandable only by themselves. Or indeed when a consumer product is consumer-friendly. The real surprise is when a lone programmer makes something for themselves that then goes on to have wide appeal.

I'm still missing what part of the intuition is incorrect? It seems like the only "incorrectness" is that there's no explicit hierarchy of branches. Except that's wrong the HEAD ref points to the default branch. Any other branches are of equal significance, though.

No, the HEAD ref points to whatever branch is "active", that's how the active branch is defined. Indeed `git checkout branchname` does nothing except make HEAD point to the commit that `refs/heads/branchname` points to.

The intuition jvns meant is the idea that a branch only constitutes the commits since the point of divergence, but every branch actually contains the full history up to the root of its tree, and `git log` of course shows that. (If you want to only show the commits specific to a branch, you can do `git log parent..branch`. Note also that two branches need not have any common history, it's perfectly possible for a git graph to be disconnected.)

`git checkout branchname` does nothing except make HEAD point to the commit

You probably know this, but since we are being pedantic we might as well get it right: That describes "git reset". "git checkout" does that and record that we are tracking branchname. So any commits will move both HEAD and the branchname reference.

Well along those same lines, making a commit doesn't move HEAD, it's still pointing at the same branchname as before.

  cat .git/HEAD

Intuition would be that the branch starts at the point that it diverges from main, labeled “base” in the first diagram. In reality, the first commit in “main” and “branch” are the same commit.

Intuition likely comes from how a tree (fir or oak, not binary) is structured. Generally a branch starts at the trunk or some other branch, not at the ground where the trunk gives way to roots.

I don't agree with the author here.

Intuition is travelling down main path that has branches which diverge and re-merge into the main path.

That's why people seem to intuitively get "merging" back into main, whereas that doesn't generally make sense for physical trees.

Yup. If it works, it ain't stupid.

I have found that git makes a lot more sense if you reverse the mental model of lineage. People think about a lineage going forward. But a more useful way to think is in terms of backward pointers.

A commit points to it's parent(s). Since a branch is just a commit ID, you can follow the parent links backwards to find the whole history of that branch.

So a "branch point" is just where two chains of parent links converge.

The special part are merge commits. Those have multiple parents, indicating that two histories fused into one.

The issue is that if you consider a branch to be what is really the history of the branch tip, then a branch is not just the part starting from the last join with another branch. Instead it is some directed path through the commit DAG, a path that in general can’t be reconstructed from the information Git keeps.

If, for example, you have a structure like

        |
        o
       / \
      o   o
   A  |   |  B
      o   o
       \ /        
        o
       / \
      o   o
   C  |   |  D
      o   o
       \ /
        o
        |

then conceptually the path CA might be one branch and DB the other branch (or alternatively, CB and DA). But this is not something that is represented in Git’s model.

a path that in general can’t be reconstructed from the information Git keeps.

Uh... yes it can. Commits have a list of 0 or more parents. That creates a DAG. There are literal hordes of tools out there that reliably interpret this, from visualizer tools to practical mutators like git bisect.

Maybe you're trying to say that no single commit order exists that traverses the whole tree. That's true, because branches can merge together. But it remains a completely interpretable graph nonetheless.

That’s not what I was saying. I was referring to the history of branch tips.

But that's not related to the DAG at all. The branch can be changed at any moment for any reason to point to any commit with any content.

But it's true that conventionally, a new branch tip should always have the previous branch tip as an ancestor. But not always as a direct parent, and even if so it might be a merge commit that joins two different branches. There is indeed no single spanning path through a DAG.

But trying to explain it as "git doesn't store enough information" to construct that spanning path seems confused to me. It's not about what git stores, it's just math: there is no such path in the general case, period.

The fact that the branch tip can be moved to unrelated commits is another issue with Git’s model, and a mismatch to the intuitive “a named lineage in the DAG” conception of branches. In other VCSs, that would be a new/different branch, and you could still rename branches so that the same name will later refer to a different branch, but the branch history as such (including renames) would be preserved.

mismatch to the intuitive “a named lineage in the DAG” conception of branches

Once more, that conception may be intuitive but it is wrong. A branch is emphatically NOT a line through the DAG, it's the whole DAG. There simply is no single list of patches to apply to get from one commit to another, even if both were at some point heads of the same branch, and even if one is an ancestor of the other.

And the reason it's wrong is that branches can merge together. You can have commit A descended from both the "main" branch and the "topic_a" branch, despite the fact that those two had diverged. This isn't a bug, it's a feature. You don't have to use it if you don't want to (lots of projects require linear commit histories in their main branch), but it's part of the tool nonetheless because some projects (Linux especially) use it heavily and to great effect.

I don’t see what is wrong. Branches in that conception are paths through the DAG. One would like to annotate such paths with names, and have those names automatically apply by default when adding a commit to the end of such a path.

This has nothing to do with lists of patches. Nevertheless, for any given path, it is always possible to compute a list of patches that would match that path. Just compute the diffs between all adjacent pairs of commits on that path. What you maybe mean is that you can’t replay just a single path and have it result in the same commit hash. That, of course, is correct, you need to replay the complete prefix DAG. However, I don’t see why you think that causes issues for the branches-as-paths conception.

Yes, branches can merge together. That just means that different paths can share nodes and edges. Just like two hiking trails can partially overlap. Again, I see no problem here.

The problem is that the thing that Git calls a branch does not identify a specific path. Look at OP's example again: if you conceptualize branches as paths, then CA/DB and CB/DA are distinct ways to divide that graph into branches. But in Git there is no way to represent that distinction (at least, not as branches).

No! No, they are not. This is a mistake you are making, and I'm trying (vainly, maybe) to correct it.

If you have a branched structure like that, and dump each commit as a patch, and try to trace a "path through the DAG" by applying those patches, you will find that they don't apply after the first merge commit. There is no single list of patches, because that's not the structure of the history. The merge commit "patch" must be a different delta depending on where you came from.

Interesting, so then which path(s) does git display when running git-log on this?

Define "this". If you git-log from the commit on the top of that ASCII graph, you get all the drawn commits listed (unless adjusted with arguments such as `--no-merges` or `--first-parent`).

You can get ASCII art of that structure with:

  git log --graph --oneline

Older versions you'll also want --decorate to show branches and tags, but I think that's on by default now.

Just to go off on a tangent - that's a pretty neat diagram for a throw away comment. was that just careful spacing in the HN textbox or did you use a tool - which one ? :-)

“Text after a blank line that is indented by two or more spaces is reproduced verbatim. (This is intended for code.)”

Looks like this also switches to a monospaced font, which makes it easier to draw ASCII art.

  This should be rendered using a monospaced font. 
  _____
  \   /
   \ /
    O

This missing piece of information would be essentially `git reflog`, except it's not something Git sends between the clones.

You can reconstruct it manually with a combination of the parent commit order and the automatic merge commit message, if you didn't change the commit message. But yeah, that second part isn't recorded in the structure itself.

That's how I learned it, not having known anything about git or version control beforehand. I used this site:

learngitbranching.js.org/

Which represents commits as circles with arrows pointing to their parents.

Git doesn't have the concept of "main is special", but at least tools like Gitlab have protected branches to stop you screwing up too much.

Some concept of "parent" and "child" branches would actually be pretty interesting. You do have to support multiple "parent" branches though for long term support branches.

It actually does but it's very much in alpha/active development (under the umbrella of OpenSSF with the intent of being integrated into mainline git eventually).

https://github.com/gittuf/gittuf

Git itself doesn't run a persistent process and I don't see how it'd make sense to prevent a user from making arbitrary changes to their local repo, so this sounds like just another server like GitHub, Gerrit, Gitlab, etc. that already have those features.

This provides a mechanism for any client to identify and reject changes from any given remote that are non-compliant with the policy committed to the repo.

So this is something that servers would certainly be able to use but it is also something that operates at the client level. And you could use it in environments where nobody uses the big hosted git platforms (github, gitlab, gitea, etc). So you could still use this in environments where you fetch changes to a project over ssh from a friend or cocontributor's dev machine. Or via basic, barebones read only https or ssh hosting.

i.e. this is access control that works independent of centralized servers.

Git doesn't have the concept of "main is special"

Technically, there is special handling for both "master" and "main" in Git in fairly obvious, but I'd argue in a not very important way. When you merge two regular branches, the commit message is `Merge branch 'source' into destination`. But not if destination is `master` or `main` – the `into ...` part is omitted for those merge commits.

But this is just for backward compatibility. Git is very conservative in changing such user facing behavior as generated merge commit messages. To get Git to treat `master` and `main` truly without special handling, set empty value to config option `merge.suppressDest` [1]:

    $ git config merge.suppressDest ""

`master` is also used as the default name for the default branch in newly created repositories. See option `--initial-branch` of `git init` and config variable `init.defaultBranch` [2] to override. Git for Windows, for example, allows setting the config option in its installer.

Source code:

For merge commit formatting: https://github.com/git/git/blob/2108fe4a1976f95821e13503fd33...

For default branch naming: https://github.com/git/git/blob/91e2ab1587d8ee18e3d2978f2b7b...

Git for Windows installer suggesting setting `init.defaultBranch`:

- https://github.com/git-for-windows/build-extra/blob/586c46ec...

Footnotes:

[1] https://git-scm.com/docs/git-merge#Documentation/git-merge.t...

[2] https://git-scm.com/docs/git-init#Documentation/git-init.txt...

There’s some special handling for FETCH_HEAD too (i.e. which branch on a remote is considered the default).

which branch on a remote is considered the default

You probably mean this place in code: [1]. It uses function git_default_branch_name from refs.c [2], which uses config variable `init.defaultBranch` I've mentioned above. But if it and other look-ups fail, it does fall back to a hard-coded "refs/heads/master".

[1] https://github.com/git/git/blob/v2.43.0/remote.c#L2380-L2394

[2] https://github.com/git/git/blob/v2.43.0/refs.c#L671-L705

Edit: removed mention of a deprecated Git feature to avoid confusion.

Protecting branches is indeed very important. I make errors all the time when screwing around. It helps enormously being restricted to just messing up one's feature branches. Many other changes can be done via the GUI with PRs and the various kind of controlled merge and rebase strategies they support, like Merge, Rebase + Merge, FF-only Merge, Squash merge, etc.

It's also a security feature. If you have a repo with a lot of developers working on it, you need to be sure they absolutely cannot slip in code with nobody noticing, or trigger CI/CD and compromise build secrets or even production.

Anything about Git reminds me of this:

https://youtu.be/EReooAZoMO0?si=sHqcYsf8v6LyWLAx

Given how many smart people are confused by Git, and how many times Git's behavior needs to be explained in a way that often raises as many questions as it answers, it seems to indicate that Git's model is not at all intuitive and doesn't map well to how people generally use it to get work done.

These are all people who have no problem understanding all kinds of other technologies and building complex systems from them.

It's not quite in "a monad is just a monoid in the category of endofunctors" territory, but when this many smart people have difficulty understanding something, I think Git is to blame, not the people.

I have to serious ask. Of the people who have issues with using or understanding git, how many of them have actually read the docs?

Git is by no means perfect but the development community is great and there is a massive focus on improving the project and making things more approachable and intuitive.

And because of that, git actually has really solid, coherent documentation with easily digestible tutorials and guides for all the things you need to do.

So it always hurts me when I see people ranting and raving about how awful git is, how it can't do x, or how it doesn't make sense how it works but then you send them the guide or tutorial hosted on the git-scm website and suddenly it makes sense.

Not to be beating the RTFM horse but RTFM guys.

Git makes version control roughly 10x more complicated than it needs to be.

I can teach an artist or designer who has never heard of version control how to use Perforce in roughly 5 minutes. They will never blow off their leg and will likely never lose work. It will probably be a few months before they hit some edge case where they need help.

Git requires building a non-trivial mental model. Then it requires memorizing a whole bunch of unintuitive commands with unintuitive flags.

Not to be beating the RTFM horse but RTFM guys.

Good tools are intuitive and can be incrementally learned without resorting to dense documentation.

RTFM is definitely a solution. But when a very large number of users have consistently similar issues at some point you have to stop blaming the users and admit the tool isn't easy to learn.

Git makes version control roughly 10x more complicated than it needs to be.

A significant part of this is that git explicitly supports more workflows than centralised VCS like subversion, perforce, or clearcase. Other than a few fairly small differences, mercurial and fossil (two other distributed VCS projects) have the same UX issues. You can't really reduce the complexity without just lopping off support. You can make that complexity more approachable/digestible (which the git devs are trying to do) but this fundamentally requires a certain willingness from the user to learn the tool.

For a lot of people git ends up being a sort of cargo cult where they get through the day with a set of magical incantations that make it work and solve their problems but without any real ability to troubleshoot when they do something wrong. Which is exactly the thing that prevents them from "gitting it".

And worth noting: I don't think I've ever worked with someone who has really gotten irreparably stuck with git after reading through the important parts of the user manual (not the references/man pages but the dedicated user manual [1]) and/or the git book [2]. There are some overly technical parts but they are towards the end of each. But if someone actually reads the entry-level user oriented parts of each/either document, odds are they'll understand enough to be able to actually solve their problems.

1. https://git-scm.com/docs/user-manual

2. https://git-scm.com/book/en/v2

A significant part of this is that git explicitly supports more workflows than centralised VCS like subversion, perforce, or clearcase.

I don't think this is quite correct. Mercurial is just as capable but has a much more accessible CLI.

Git pushes edge-cases to front and center. Almost all Git users use Git as a defacto centralized tool. There's a remote on GitHub, the end. Almost all users never need any feature that has to do with decentralization.

Git has somehow tricked people into thinking that version control is complicated. It's tragic.

I've read the docs. Too much explanation of command line flags, not enough practical examples. It is thorough though.

Including the git user manual [1] and the git book [2]? People tend to skip over those and go straight to the reference/manpages.

1. https://git-scm.com/docs/user-manual

2. https://git-scm.com/book/en/v2

Whenever I read the Git docs, after a while I start thinking "this is all very well and good, but it doesn't seem to be related to what I'm trying to do to get my work done (usually fairly basic things)"

Or I have read a bunch of pages on the git-scm site, and I'm thinking "oh yes it all makes sense now." Then I'm trying to do something in the real world, and I get bizarre messages and conflicts that don't make any sense. Or I made a mistake and want to undo it, and end up in some crazy situation. The Documentation doesn't seem to help in anything but an ideal textbook scenario with no mistakes and complications.

That article would have been a lot better if it showed illustrations for the "right" mental model too.

The right mental model is to realise the 'main' branch is only special by convention - git doesn't actually treat it differently from any other branch.

All of the confusion expressed in the article stems from a misunderstanding that main should work in some special way.

Of course every branch's history goes all the way back to root and not to some arbitrary common commit of another branch like 'main'. Of course rebase and merge can work "backwards" from main onto some branch (because it's not "backwards" because main is not special - it just isn't done much in practice because keeping main straight helps with collaboration)

Furthermore, by realising that main isn't inherently special, it becomes obvious that the actions can be done between any two branches as needed.

The right mental model is - it's just commits, all the way down.

"All of the confusion expressed in the article stems from a misunderstanding that main should work in some special way."

I didn't have that impression when reading that article.

To me it seems that the confusion comes from thinking in actual branches, and not from thinking anything special about main.

They're only thought of as actual branches (off of main) if you're also thinking of main as a trunk that is in some way special or different.

tl;dr Please ignore, just me working through a Python+pygit2 problem. I solved it in a grandchild comment.

I had so much trouble trying to map my intuited/mental model of git onto pygit2 that I gave up and just used the git module.

I wanted to automate a fairly simple thing in Python as opposed to bash+commands. My reasoning being that I wanted to do it "right" and be a Big Programmer Real Boy(tm). I just wanted to create a branch remotely in Github, pull the repo, and checkout the new branch. I got stuck going in circles trying to figure out why I was always left in detached HEAD state because I didn't understand exactly what git was doing during a checkout.

    # repo has already been pulled
    if os.path.exists(repo_path):
        local_repo = git.Repo(path=repo_path)
        self.log.debug(f"current branch: {local_repo.active_branch.name}")
        local_repo.git.checkout(branch_name)

That's super easy and is much the same as running the commands in the shell or in a bash script.

Of course, I've lost my poor implementation using pygit2, so I'll add that later if I find it. Thankfully there's a good discussion surrounding the issue I encountered in this excellent "roll your own git in Python", which doesn't use pygit2, but the concepts are the same: https://www.leshenko.net/p/ugit/#checkout-switch-branches

This isn't asking someone else to make this work, it's more of a caution to convince folks like me to just use "import git" rather than pygit2:

So something like this was what I expected to work, but leaves the repo in detached head state:

    import pygit2
    def checkout_branch(path, branch_name):
        repo = pygit2.Repository(path)

        branch_ref = repo.lookup_reference(f"refs/remotes/origin/{branch_name}")
        print(f"{branch_ref.name}")

        repo.checkout(branch_ref)

The branch_ref.name prints "refs/remotes/origin/test" but git status says "HEAD detached at origin/test"

So I'm probably feeding the wrong thing into repo.checkout, but I'm honestly not sure what else it should be.

Funnily enough, git itself tries to do the right thing if pulled in a detached head state:

    From https://github.com/testorg/example
    * [new branch]          test       -> origin/test
    You are not currently on a branch.
    Please specify which branch you want to merge with.
    See git-pull(1) for details.

        git pull <remote> <branch>

Ha, and of course just messing around gets me something that actually works.

There always seems to be just one more stackoverflow thread to read that has the real answer: https://stackoverflow.com/questions/68435607/how-to-clone-ma... (found via Kagi which I wasn't using before, and the search "pygit2 detached head")

    def checkout_branch(path, branch_name):
        repo = pygit2.Repository(path)

        main_branch = repo.lookup_branch("main")
        print(f"Main branch upstream: {main_branch.upstream_name}")

        if branch_name not in repo.branches.local:
            print(f"Branch {branch_name} not found in local branches")
            remote_branch = "origin/" + branch_name
            if remote_branch not in repo.branches.remote:
                raise SystemExit(f"Branch {remote_branch} not found in remote branches")
            (commit, remote_ref) = repo.resolve_refish(remote_branch)
            repo.create_reference("refs/heads/" + branch_name, commit.hex)

        branch = repo.lookup_branch(branch_name)
        print(f"Branch name: {branch.name}")

        repo.checkout(branch)
        print(f"Is branch head? {branch.is_head()}")

        (commit, branch_remote) = repo.resolve_refish("origin/" + branch_name)
        print(f"Remote branch: {branch_remote.name}")
        branch.upstream = branch_remote

With git reflog telling me the right thing:

    d44aedc (HEAD -> test, origin/test) HEAD@{0}: checkout: moving from main to test

And git push has the remote branch already set.

I wish there was a pair programmer AI that you had to explain stuff to. That would enable the "by explaining it, I solved it" phenomenon.

I wish there was a pair programmer AI that you had to explain stuff to. That would enable the "by explaining it, I solved it" phenomenon.

It's called rubber duck debugging, named for having an actual rubber duck at your desk you'd talk to.

Lately I’ve wanted branches (heads) to have a corresponding tail which points to the base commit that the branch sits on top of (like the commit on `main` when you created the branch).[1] Because branches get rebased all the time and eventually you have six commits out in the Æther somewhere and you have to think twice about where it even starts. And yeah you can probably think for a few seconds and recall that you have worked with John and not Jimmy on this branch so the seventh commit backwards that belongs to Jimmy must be the commit base. Or Git can tell you that the seventh commit belongs to `main` already. But why should you have to expend any effort?

You can optionally include the base commit when you send out “patches” to a mailing list.[2] Because it might not have been obvious that you based your changes on:

- The latest release

- The main development branch

- Some integration branch (probably an error)

You also need to keep the “base” in mind when you use `git range-diff` because that tool takes two ranges lik `main..previous` and `main..current`. And sometimes you can rely on just using `main..` and letting Git figure it out but in my experience passing an explicit value sometimes works better.

`git range-diff` is a super-cool but perhaps niche tool. But you basically have to use it on review round number 2 and higher when you are sending changes to the Git project.

[1] This has been discussed before and there was a patch series that implemented it. But that was basically a POC and done in the spirit of “this is useless IMO but here’s how you could do it”... and the implementation didn’t factor in all the shenanigans that you can do with `reset` and `rebase` so it couldn’t have been merged as-is. (Although to be fair: the bar was not set to work perfectly with any kind of branch reset etc., which I suspect is impossible in any case.)

[2] Patches after all are just commit messages plus the patches themselves and don’t tell you what they are based on.

It looks like this is what git merge-base --fork-point is supposed to do, although according to the docs it is not 100% reliable.

Based on all the discussions I’ve seen I think it’s impossible to programmatically find the “base” in general. Maybe it’s possible for most cases though.

But... If you have a rebase workflow, then `git checkout trunk; git rebase branch` is exactly how you "merge" an offshoot branch into a trunk branch! That's what Github does when you rebase-merge a PR, for example.

No, that’s not right. If you did that, you would need to force push to get the result pushed to the remote.

Oh, right. So what actually happens is that the offshoot must first be rebased on top of the trunk, and then trunk can be fast-forward merged/rebased (same thing, really) to the offshoot's head.

That's an excellent explanation.

> “Wrong” models can be super useful.

This is used in usability and UX design a lot. Affording mental models that don't reflect the actual code, happens all the time.

This is perfectly fine and the added value of a great application if it can hide the underlying reality completely. With Git, the abstractions are paper-thin at best though. Good UIs can indeed cover up many aspects, but they only work as long as there are no merge or rebase conflicts. To correctly resolve these, the user has to have a precise picture of what is actually going on.

This is used in usability and UX design a lot.

It's the fundamental thing that makes UI work. I've always liked the title of Brenda Laurel's book - Computers as Theatre

I've learned only one constant with git in my years as a programmer: master your own employer's git use cases, and pray to god for three things:

1. you don't change places often and thus git patterns.

2. you don't accidentally ship and commit a multi-GB file to your remote.

3. you don't change the git process on yourself and your colleagues without an extremely solid reason.

Document your chosen git patterns, even in 2023.

I always get back to this page when trying to understand/show how git works under the hood: https://eagain.net/articles/git-for-computer-scientists/

It summarizes fundamentals clearly.

If you learned from this (excellent) piece, I recommend that you buy and work through https://leanpub.com/learngitthehardway . It will take less than a day, and you'll have a much stronger foundation for a core tool.

The article goes in the right direction, but from a weird starting point. Saying things like "a branch contains the entire history" just adds to the general confusion about Git. Git does not have branches. Sure, Git emulates branches to appear familiar and intuitive, but it is actually counterproductive to use that as a starting point to explain how Git works. Git manages a graph of commits and some of those commits need human readable labels. That's it. The only thing that contains the entire history is the commit graph itself.

Great explanation. Thanks, Julia!

Another situation where the intuition breaks is that a repo can have branches without a common base (i.e., disconnected graph).

Definitely unusual, but sometimes I want to move a folder between repos while preserving the commit history of the folder[0].

[0] https://stackoverflow.com/questions/41811986/git-move-direct...

You do need to explicitly specify the other branch when merging or rebasing or making a pull request (like git rebase main), because git doesn’t know what branch you think your offshoot is based on.

I think a big issue with the presented intuition is that it's limited to wanting to merge the base/trunk/main branch into your feature branch. However, sometimes you want to merge a feature branch into another feature branch. With this in mind, you can form a better intuition, imo, where it's absolutely clear that you have to specify what branch you want to merge into another one.

Some of Julia's tweets started to get suffixed with "I don't want advice about this". It must have reached unacceptable levels.

I think that one way to "easily" understand the syntax of git is to remember that when you perform a command you "always" modify the current branch

for example: git merge my-branch will merge my-branch into the current one

while git rebase my-branch will rebase current one on top of my-branch

There is a very good article by GitHub: https://github.blog/2020-12-17-commits-are-snapshots-not-dif...

TLDR: Think of commits as snapshots, not diffs, and you'll be fine.

We're teaching Git wrong. Most of the common confusion is due to people learning from the porcelain down to the plumbing, when it should be the other way around. If you limit your mental model to the plumbing, there's generally only one outcome that you want, but there are a dozen ways to get there from the porcelain. You can choose whichever one you prefer. But if you start from one of those dozen ways, they could each lead to a different outcome than you expected.

I'm forever grateful for one of my early internships, where a guy from GitHub visited the office and gave us a one day workshop on Git. He started from the internals and explained how Git models your codebase. (He's also the one who introduced me to the idea of plumbing vs. porcelain.) Then once we had a common language, teaching the porcelain was a matter of starting from the plumbing and working upwards, rather than the other way around.

Another invaluable resource in learning Git is this interactive tutorial [0], which renders a tree diagram of start state and desired end state and makes you write the commands (for which there are often many options!) to get to that end state. This reinforces the idea that the best way of planning Git commands is to first visualize the end state you want, and then reason about how to get there.

Also: RTFM! Not just once. Go back to it. You'll learn something new every time. The docs [1] are really good.

[0] https://learngitbranching.js.org/

[1] https://git-scm.com/docs

Mercurial is SO much better than git especially how it works at meta. I hate git.

Years ago I wrote this dynamic tutorial that visualises branches as you read: https://agripongit.vincenttunru.com

It's aimed at folks who know how to use `git add` and `git commit`, and would like to spend 15 minutes to form a mental model to help them understand what's going on.

In case it's useful to someone.

I just reread my take on branches and relearned some stuff I’d forgotten: https://peter-whittaker.com/obligatory-grokking-git-post

Warning, all text, no diagrams....

Another way to think about merging and patches : https://jneem.github.io/merging/

Reading this it seems to me that it should be incredibly easy to create an alternative version of the git client that stores the lineage per branch and can inform people if they’re doing things they probably shouldn’t.