I do not want to learn git tricks. I just wanna use it as simple as possible. Just let me push my code and be done with git and keep on working.
Kudos to all who love git, for me, it's just a tool I have to use.
I do not want to learn git tricks. I just wanna use it as simple as possible. Just let me push my code and be done with git and keep on working.
Kudos to all who love git, for me, it's just a tool I have to use.
Heya, author here.
I have to admit that I learned a lot of these things fairly recently. The large repository stuff has been added into core piece by piece by Microsoft and GitHub over the last few years, it's hard to actually find one place that describes everything they've done. Hope it's helpful.
I've also had some fun conversations with the Mercurial guys about this. They've recently started writing some Hg internals in Rust and are getting some amazing speed improvements.
I'm also thinking of doing a third edition of Pro Git, so if there are other things like this that you have learned about Git the hard way, or just want to know, let me know so I can try to include it.
I chuckled at the title - "So You Think You Know Git" - no, one does not think one knows git :-)
Except Chuck Norris that is!
Chunk Norris
Never not a good time for this:
Git thinks it knows you.
You probably already know these bits & bobs, but I wanted to share:
[diff]
external = difft
Use the fantastic difftastic instead of git's diff. https://difftastic.wilfred.me.uk/ [alias]
fza = "!git ls-files -m -o --exclude-standard | fzf -m --print0 | xargs -0 git add"
gone = "!f() { git fetch --all --prune; git branch -vv | awk '/: gone]/{print $1}' | xargs git branch -D; }; f"
root = rev-parse --show-toplevel
Those are the most used aliases in my gitconfig."git fza" shows a list of modified/new files in an fzf window, and you can select each file with tab plus arrow keys. When you hit enter, those files are fed into "git add". Needs fzf: https://github.com/junegunn/fzf
"git gone" removes local branches that don't exist on the remote.
"git root" prints out the root of the repo. You can alias it to "cd $(git root)", and zip back to the repo root from a deep directory structure. This one is less useful now for me since I started using zoxide to jump around. https://github.com/ajeetdsouza/zoxide
difftastic - amazing!
I've been wanting something like this for years...
Be prepared to hand out the difftastic URL and install instructions a lot :) I get asked "what git setting is that?" when I do diffs while sharing my screen.
Thanks for the difftastic & zoxide tips!
However, I've been using this git pager/difftool: https://github.com/dandavison/delta
While it's not structural like difft, it does produce more readable output for me (at least when scrolling fast through git log -p /scanning quickly)
This one I'm less sure about. I haven't yet gotten it to the point where I really like using it, but I'm sharing since someone might find it useful as a starting point:
[alias]
brancherry = "!f() { git checkout -b $(git rev-parse --abbrev-ref HEAD)-$(git rev-parse --short \"$1\") $1; }; f"
It's intended to be used for creating a cherry-picking branch. You give it an branch name, let's say "node", and it creates a branch with that as its parent, and the short commit hash as a suffix. So running "git brancherry node" creates the branch "node-abc1234" and switches to it.The intended workflow being you cherry pick into that branch, create a PR, which then gets merged into the parent.
One thing about git I learned the hard way is the use of diffs and patches (more accurately, 3-way merges) for operations like merging, cherry picking and rebasing. Pro-git (correctly) emphasizes the snapshot storage model of git - it helps a lot in understanding many of its operations and quirks. But the snapshot model can cause confusion in the case of the aforementioned operations - especially rebasing.
For example, I couldn't understand why the deletion/dropping of a commit during a rebase caused changes to all subsequent commits. After all, I only asked for a snapshot to be dropped. I didn't ask for the subsequent snapshots to be modified.
Eventually, I figured out that it was operating on diffs, not snapshots (though storage was still exclusively based on snapshots). The correction on that mental model allowed me to finally understand rebasing. (I did learn later that they were 3-way merges, but that didn't affect the conclusions).
That assumption was eventually corroborated somewhere in Pro-Git or the man pages. But I couldn't find those lines again when I searched it a second time. I feel that these operations can be better understood if the diff/patch nature of those operations are emphasized a bit more. My experience on training people in rebasing also supports this.
PS: Thanks for the book! It's a fantastic example of what software documentation should look like.
Eventually, I figured out that it was operating on diffs, not snapshots
The snapshot include all the history that led to the current snapshot. So even if you did a squash instead of dropping, you're changing everything that depends on that
The snapshot include all the history that led to the current snapshot
Git snapshots don't contain any history, other than the commit chain (reference to the parent commit/s) in the commit object. While the storage format is a bit complex, they behave fundamentally like a copy of the working tree at the point of commit.
So even if you did a squash instead of dropping, you're changing everything that depends on that
Squashes don't change the subsequent commits/snapshots either, other than the commit ID and chain. The tree itself remains untouched. You can verify this.
What I'd stress out is that rebasing is nothing else than automated cherry-picking, as it's hard to imagine cherry-picking in any other way than 3-way merge or patch operation.
Great tips and great article :+1:
One question that I have is what is happening to large file support within Git? Has that been merged into the core since Microsoft changes have also made it into core. Obviously there is a difference in supporting very many small files or a few very large files but won't it make sense to roll LFS into core as well?
What a great question. If I recall correctly, the LFS project is a Go project, which makes it difficult to integrate with Git core. However, I believe that the Git for Windows binary _does_ include LFS out of the box.
There was a discussion very recently about incorporating Rust into the Git core project that I think had a point about LFS then being viable due for some reason, but I'd have to find the thread.
Thanks for the insight. I'm surprised to hear that LFS is Go based, I would have thought LFS outdated Go - but learn something new everyday! :)
In the part about whitespace diffs, you might want to mention ignore-revs-file [0]. We check an ignore-revs file into the repo, and anyone who does a significant reformat adds that SHA to the file to avoid breaking git-blame.
[0] https://git-scm.com/docs/git-blame#Documentation/git-blame.t...
I dont think I knew this. Great tip, thanks!
I just wanted to say thanks for the entertaining talk you gave at FOSDEM and also that I appreciated the Sneakers reference :)
Ha! Yeah, I was wondering if anyone would catch that. I thought I heard a snicker or two in the audience, but I couldn't be sure.
Hey Scot,
I met you and we chatted for a bit at a bar after hours at a tech conference years ago, before you dropped you were a GitHub co-founder towards the end. You actually gave me some advice that has worked out well for me. Just wanted to say thanks!
In vino veritas. Thanks for the thanks. :)
Hi Scott!
First off, I loved your presentation. And your book. As someone who actually bothers to read most of github's "Highlights from Git" blogs, that the, I was somewhat familiar with some of them, but it was still very informative.
Also liked your side-swipe at people who prefer rebase over merge, I'm a merge-only guy myself...
I also took a look at GitButler and it looks like it could potentially solve one of my pain points.
If you're looking for things which are confusing to beginners, for a future version of your book, there are many useful / interesting / sometimes entertaining git discussions/rants here on HN. One of the recent ones is:
I love the GitHub Git blog posts. They should have a bigger audience. Taylor is a machine.
How about an RSS feed on your blog? If there's one there, it's not obvious.
https://blog.gitbutler.com/rss/
most of the time trying the main URL + /rss works
also the tag is there <link rel="alternate" type="application/rss+xml" title="GitButler" href="https://blog.gitbutler.com/rss/">
Thank you for your writing on Git over the years, particularly Pro Git, which is helpful.
Thank you for reading. :)
Hey little feedback on the terminal images in your posts. I'm viewing this on a phone, and it would be better if the terminal images were just the terminal (some are) and not surrounded by a large blank space which is your wallpaper. This would make it a bit easier to read on small screens, without the need to zoom in!
But then would it be as pretty?
I watched the FOSDEM talk yesterday, and I laughed hard when I heard "Who use git blame -L? Does anybody know what that does?" because it suddenly looked like the beginning of a git wat session. But it was really informative, I learned a lot of new things! Thanks
I wrote this and wish more people followed this specific git advice:
https://mergebase.com/blog/doing-git-pull-wrong/
TLDR: don’t be afraid of rewriting history but ALWAYS do “git pull -r —autosquash “
I remember watching your FOSDEM talk on YouTube, where you asked whether people have rerere turned on _and_ know what it is, in one question. I have it on, but only the faintest of clues what it is! Just git things, I suppose.
I never really understood why the majority of developers insist on using the git CLI, when modern UI clients like GitKraken [0] are perfectly usable and very helpful. :shrug:
Shrug back at ya. I find the cli perfectly usable. I use an editor plugin as well but if I'm already on the command line I use it there. Having to switch to a different program just to make a commit kills the desire to commit often.
Looks neat, but I tend to get way too distracted by graphical interfaces. I assume it's really a question of personal preference. CLIs are faster to use, but have a bigger learning curve. (we will probably not solve that debate here, but I do wonder sometimes, whether to recommend the CLI or not)
Most of my git usage on the CLI is nothing fancy, just a few commands, but I keep a text file for some tips/tricks I don't use regularly.
I do as little with git as possible unless im facinng some very specific issues, so for me at least it seems overkill to use a GUI for essentially just push, pull, and checkout.
I stopped pretending as if I know what I am doing and instead use visual Git tools, such as SmartGit or the one that comes with IntelliJ. Being a Git "command-line hero" is for show offs.
Porcelain can be just infuriatingly confusing. For example, "Yours and Theirs" can mean the opposite in different contexts. The whole user interface has no common style or theme - it needs a new "visual" layer in order to not drive one up the wall.
Same here. Personally for everyday tasks I always use a visual Git tool, specifically Tortoise Git.
For complex tasks, like fixing someone else's mess (or my own), I always start with a visual tool to look at the history and the commits, also look at the reflog (again, in a visual tool, it's much faster for me), understand what the mess is and if I can find anything to salvage, look at some diffs.
Then if it's just a commit I need to return to, I do a reset --hard. If I need to combine stuff from several commits, then I usually use the commandline.
Same. Having the diff all the time is nice, and a visual check at a glance of what is about to happen is very nice, without the need to run a bunch of extra commands.
I know enough about the app, and git in general, to get my job done. On the rare occasion I need more, I can look it up. I think I’ve only had to do that once or twice in all the years I’ve been using it.
Thanks, I knew about -committerdate but not that you can set it as default sort, super useful. A few notes...
1. git columns gets real confusing if you have more data than fits the screen and you need to scroll. Numbers would help...
2. git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
3. Regarding git blame a little known but super useful script is https://github.com/gnddev/git-blameall . (I mean, it's so little known I myself needed to port it to Python 3 and I am no Python developer by any stretch.)
git maintenance sounds great but since I do a lot of rebases and stuff, I am worried: does this lose loose objects faster than gc would? I see gc is disabled but it's not clear.
“gc” is disabled for the scheduled maintenance. It’s enabled as task when running “maintenance run” explicitely.
It would not collect loose objects faster than gc would, because it just runs gc.
2. It does not. It will pull data from the remote more often, and repack data. With GC turned off the data won't be deleted
3. Nice, I may have to try this
Yesterday I was actually trying to find out which are the top 10 files which were having most of the modifications after they were created and I stumbled upon https://github.com/tj/git-extras/blob/main/Commands.md
Some great extra git command are there.
Yeah, git-extras are great. Another cool one is "git absorb":
A much simpler one, but at times also very useful is “git attic”
https://leahneukirchen.org/dotfiles/bin/git-attic
It lists files that were deleted and which commit deleted them.
Very interesting!
Id like to see anyone else solve the challenge of many people contributing code towards different releases, different features, hotfixes, tagging releases, going back to find bugs, with an "easier" interface.
It's like people who want a low level language that hides all complexity of the system - they are literally exclusive to each other. Im happy with git, its not that hard to learn, and some people need to just grow some (metaphorical) balls and learn git.
Great tips, thank you!
That's why I'm a huge shill for gitkraken. It's a paid product so I'm a little hesitant sometimes but I've used them all and nothing compares to the power it unleashes. It completely lifts the curtain on the black box that many developers experience in the terminal and puts the graph front and center. It exposes rebasing operations in an effortless and intuitive visual way that makes git fun. As a result, I feel really proficient and I'm not scared of git at all. I can fix just about anything and paint the picture I want to see by carefully composing commits rather than being at the mercy of the CLI. I still see CLI proficiency as a valuable skill but it's so painful sometimes to watch seasoned 10 yr developers try to solve the most basic problems or completely wreck the history in a project because they're taught you can't be a real engineer if you don't use the git CLI exclusively. Lately I've resorted to arguing "use the CLI but you should at least be looking at the graph in another window throughout the day - which you can do for free in vs code, jetbrains, or even the CLI" For example: anytime one of my teammates merges a pr, I see it and I rebase my branch right away. As a result my branch is always up to date and based on main so I never run in to merge hell or drop those awful "fix conflicts" commits in the history.
Learnt something new about core.fsmonitor. Thanks.
On the subject of large monorepos, I wish "git clone" has a resume option.
I had this issue back in 2000s when trying to clone the kernel repo on a low bandwidth connection. I was able to get the source only after asking for help on a list and someone was kind enough to host the entire repo as a compressed tar on their personal site.
I still have this problem occassionally while trying to clone a large repo on corporate vpn that can disconnect momentarily for any reason(mainly ISP level). Imagine trying to clone the windows repo(300GB) and then losing the wifi connection for a short time after downloading 95%.
It is wild that both git and docker, the two major bandwidth intensive software of modern development stack don't have proper support (afaik) to resume their downloads.
I suppose you could do this by shallow cloning and then expanding it multiple times. But yes, the fetch/push protocols really expect smaller repos or really good inet connections and servers.
I read (and upvote) anything git related by Scott Chacon. He was instrumental in me forming my initial understanding of the git model/flow more than 10 years ago, and I continue to understand things better by consuming the content he puts out. Thanks Scott!
Thanks James!
Just another git flow https://medium.com/@sbnajardhane/just-another-git-flow-90d0a...
I have vastly simplified my git workflow with some aliases that work in a variety of settings
git synced #<- sync (rebase) current branch with upstream if defined, otherwise origin. Uses master or main, preferring main
git pub #<- publish my changes to remote origin (force push with lease)
git pr #<- open appropriate PR to github (no GH client needed, just opens the URL). PR goes to upstream if defined
git hub #<- opens github page for repo
https://softwaredoug.com/blog/2022/11/09/idiot-proof-git-ali...
Keep it simple for everyone's sake. The last thing you want is to be round kicked by your version control system.
I’m a bit surprised to see this new article given Chacon’s recent popular comment here.[1] Although I guess I shouldn’t be since I noticed in his bio last time that he was working on something called “Git Butler”.
Tips and Tricks are sometimes the way for developers to throw jabs at each other on the 'workflow' battles that every dev team faces. The bells and whistles are all there to run everything from a lemonade stand to a Walmart Super-Center. Hence the overwhelming complexity. Just lately I started running 'git init' in all the new folders I create in my development box. Heck, is good to see whats going on everywhere you work, not only in the designated git repos. But going back to the well known complexity of the git API's I recall the the song that goes: "It takes a man to suffer ignorance and smile"
A tip for the tagline: drop the `column-count: 2` on `.item.is-hero .item-excerpt` (on desktop).
Also known as the "don't dead, open inside" bug.
It seems that a number of defaults in Git could be changed for the better (were it not for breaking backward compat).
A part of Git's complexity is due to the fact that it was originally meant to be just the plumbing. It was expected that more user-friendly porcelain would be written on top of the git data model. Perhaps that is still the best bet at having a simple and consistent UI. Jujutsu and Got (game of trees) are possible examples.
this describes all of unix. as soon as scripts were allowed to use commands, those commands could never be changed. lest we have a nerd riot on our hands
Creating an alternative UI is rather uncontroversial. The plumbing doesn't need replacement. Jujutsu for example, seems to be popular.
Ha, replacement? You can't even get them to fix bugs. If you fix a bug in a unix command you'll break every script in existence and bring the world down. It's idiotic.
The user's a file! The internet's a file! Keyboard is a file! What are checkboxes? This is a volunteer project! You can't expect us to include UI in the OS! We'll just bikeshed forever so sorry, write your own, lol.
Getting a bug fixed in rsync, pulled down to LTS Ubuntu took a single email. I’m not sure what you are going on about.
Congratulations. You just broke userspace.
You clearly have no idea what you are chattering about. The saying "Don't break userspace" is for the kernel. It has nothing to do with userspace programs potentially affecting other userspace programs.
That's not a script thing, that's an API surface thing, and even then only applies to backwards-incompatible changes. You can change the arguments to git or chmod just as easily as printf() or fork()
This describes all of programming. They are called dependencies and they tend to be versioned. Breaking changes affect literally every aspect of software development. Software that isn’t maintained will no longer function at some point in the future.
That's a bold statement. Any proof or article where Linus states that?
If you mean the plumbing part, I recalled it from memory. I don't have anything from Linus to back this up. But have a look at this from the Pro-Git book [1]:
Note that its author (schacon) is also the author of the article and is replying in this discussion thread.
I also remember reading somewhere that this design was the reason for the complexity in the porcelain. Will update if I find a reference.
[1] https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Po...
Boy, I can't find this either (but also, the kernel mailing list is _really_ difficult to search). I really remember Linus saying something like "it's not a real SCM, but maybe someone could build one on top of it someday" or something like that, but I cannot figure out how to find that.
You _can_ see, though, that in his first README, he refers to what he's building as not a "real SCM":
https://github.com/git/git/commit/e83c5163316f89bfbde7d9ab23...
Here is what I found based on your lead ("real SCM", from 17 Apr 2005):
[1]: https://lore.kernel.org/git/Pine.LNX.4.58.0504170916080.7211...
Nice, yes, I think this is what I was remembering. Good find!
So, I found the git-pasky project in the _very_ early days (like a couple days after Linus's first git commits) and iirc, it was an attempt to build an SCM-like thing over the plumbing that Linus was working on:
https://marc.info/?l=linux-kernel&m=111315057710062&w=2
I wouldn't say it's very bold at all. I don't have any links but if you've been using git for the past decade, you would have heard something along these lines. "A toolkit for building VCS's" is one thing I remember reading. There was little in the way of polish when it came to porcelain commands when people started using it. I think there are still many people who don't use it who still think it's still this way.
Well, the existence of and story behind cogito [1] should be decent proof.
[1]: https://en.m.wikipedia.org/wiki/Cogito_(software)
It's a collection of hacky tools for manipulating a DAG of objects, identified by a SHA-1 hash. If you look at it this way, you wouldn't expect any consistency in the CLI interface.
I don’t think this is a fair characterization. The reason git is confusing is that its underlying model doesn’t resemble our intuitive conceptual model of how it ought to work.
This was classic Torvalds — zero hand holding. But he gets away with it because the way git works is brilliantly appropriate for what it’s intended to do (if you just ignore the part where, you know, mere mortal humans need to use it sometimes). I ended up writing my masters thesis a decade ago about the version control wars, and I (somewhat grudgingly) came away in awe at Torvalds’ technical and conceptual clarity on this.
Is your Master’s Thesis available online?
No. The reason git is confusing is that the high-level commands have very little thought put into them, they are indeed “a collection of hacky tools to manage a DAG of objects”.
That the underlying model shines through so much is a consequence of the porcelain being half-assed and not designed. The porcelain started as a bunch of scripts to automate common tasks. The creators and users of those scripts knew exactly what they wanted done, they just wanted it done more conveniently. Thus the porcelain was developed and grouped in terms of the low level operations it facilitated.
Not true at all.
https://stevelosh.com/blog/2013/04/git-koans/
I don't totally disagree. I love Git and I find all these things very cool, but I know it's overhead a lot of people don't want. The post is on the blog of the new GUI that I'm trying to build to make the cool things that Git can do much faster and more straightforward, so maybe check it out if the CLI isn't your favorite thing.
Beyond a junior engineer, I’d expect an engineer to know more than the basics if they’ve been using git for their entire career so far.
Git is the power saw for software engineers. You don’t want someone who can’t keep all their fingers and toes anywhere near your code.
Not knowing git, when you’ve been interacting with it for years, is a red flag for me. I’m not expecting people to know the difference between rebase and rebase --onto, but they should at least know about the reflog and how to unfuck themselves.
Honestly, get outside of commit and pull and my brain feels dread like regular expressions.
Hopefully the incantation is on the Cheat Sheet and I don't make it worse.
I had the same experience for a long time and then I took a bit of time to have a deeper look behind the curtain and I have to say, once you grasp the data-model of git itself (a branch is a pointer to a commit, a commit is a pointer with metadata to a tree, a tree is...), many of the commands start to make sense all of a sudden, or at the very least "stop looking dangerous".
As it's one of those rare tools that's probably meant to stay for quite some time and we interact with quite frequently, it was time well spent for me, and it turns out it's really not as hard as the scary-looking commands imply.
Why even use git then? `scp code server:code` does what you need.
This isn't a rhetorical question.
Because all the organizations that pay me to write code host their code centrally on GitHub
I do not want to learn programming. I just wanna use the computer as simple as possible. Let me just tell it what to do and be done with it.
Kudos to all who love programming, for me, it's just a tool I have to use.
Isn’t that where most interest starts? A computer really is a tool. I know for me, it was an unfortunate discovery at the very start of my interest in computing that to do the things I wanted I had to deal with all these tedious bits of programming.
Even today I’d like to skip most of the underlying tedious bits although I understand knowledge and willingness to deal with much of those underlying tedious bits are what keep money flowing into my account regularly. That’s about the only saving grace of it. There are so many ideas I’d love to explore but the unfortunate fact is there’s a lot of work to develop or even glue together what one needs to test out, not to mention associated infrastructure costs these days. Even useful prototypes take quite an endeavor.
I understand your sentiment but git is really not all that hard. And knowing a few things that go beyond bog-standard checkout/commit/push, especially history-rewriting activities, will greatly improve quality of commit-history - which might not be of much use for you but might help other engineers working on your project to make easier sense of what's going on.
And on another note, git is probably one of the longer-lasting constants in our industry. Technologies develop and change all the time, but for git, it looks like it's here to stay for a while, and it's probably one of the tools we interact with most in day-to-day dev-work. Might be worth having a bit of a look at :)
I'd argue that CVS outlasted git by at least a couple of decades...
I agree, what more is needed than push and diff and branching, sometimes reset and rebase
My feeling is that the git interface is a leaky abstraction. I also don't want to learn git tricks, but unfortunately I learned more about it than I wanted to.
Totally agree. However, then coworkers who don't understand even the simple git commands mess up their branches (somehow), and... then my git tricks save the day (unfortunately).
Simplicity is in the eye of the beholder. A single trick can save you a whole lot of work. Take for example interactive rebate which allows you to update your local branches to merge and reorder local commits. If you had to do everything by hand you would certainly have to work a lot more.
There is detail inherent in the problem and some not. I tend to think we underestimate the inherent problems.
So I'm happy for the 'complexity' of git.
same