return to table of content

A Git story: Not so fun this time

cryptonector
20 replies
2d20h

In a 2022 survey by Stack Overflow, Git had a market share of 94%, ...

Never in history has a version control system dominated the market like Git. What will be the next to replace Git? Many say it might be related to AI, but no one can say for sure.

I doubt it's getting replaced. It's not just that it's got so much of the market, but also that the market is so much larger than back in the days of CVS.

It's hard to imagine everyone switching from Git. Switching from GitHub, feasible. From Git? That's much harder.

fragmede
14 replies
2d14h

Git shortcomings are well known by this point, so "all" a successor project has to do is solve those problems. Git scales to Linux kernel sized projects, but it turns out there are bigger, even more complex projects out there, so it doesn't scale to Google-sized organizations. You would want to support centralized and decentralized operation, but be aware of both, so it would support multiple remotes, while making it easier to keep them straight. Is the copy on Github up to date with gitlab, the CI system, and my laptop and my desktop? It would have to handle binaries well, and natively, so I can check-in my 100 MiB jpeg and not stuff things up. You'd want to use it both as a monorepo and as multirepos, by allowing you to checkout just a subtree of the monorepo. Locally, the workflow would need to both support git's complexity, while also being easier to use than git.

Anyway, those are the four things you'd have to hit in order to replace git, as I see them.

If you had such a system, getting people off git wouldn't be the issue - offer git compatibility and if they don't want to use the advanced features, they can just keep using their existing workflow with git. The problem with that though, is that then why use your new system.

Which gets to the point of, how do you make this exist as a global worldwide product? FAANG-sized companies have their own internal tools team to manage source code. Anywhere smaller doesn't have the budget to create such a thing from scratch but

You can't go off and make this product and then sell it to someone because how many companies are gonna go with an unproven new workflow tool that their engineers want? What's the TAM of companies for whom "git's not good enough", and have large enough pocketbooks?

Borg3
8 replies
2d9h

You are right. GIT is not DVFS, its DVCS. It was made to track source code, not binary data. If you are putting binary to DVCS, you are doing something wrong.

But, there are industries that need it, like game industry. So they should use tool that allow that. I heard that Plastic-SCM is pretty decent at it. Never used it so cant tell personally.

Replacing GIT is such a stupid idea. There is no ONE tool to handle all cases. Just use right one for your workflows. I, for example, have a need to version binary files. I know GIT handles them badly, but I really like the tool. Solution? I wrote my own simple DVFS tool for that usecase: dot.exe (138KB)

Its very simple DVFS for personal use, peer to peer syncing (local, TCP, SSH). Data and Metadata are SHA-1 checksumed. Its pretty speedy for my needs :) After weeks of use use I liked it so much, I added pack storage to handle text files and moved all my notes from SVN to DOT :)

ozim
3 replies
20h45m

Second that way of thinking, for me GIT is as good as it gets for versioning text files.

Not handling binary files is not a downside for me because GIT should not be a tool to handle binary files versioning and we should use something else for that.

fragmede
2 replies
13h31m

What do you use when you have a tiny .png or .jpg that needs to live alongside your source code now?

ozim
0 replies
11h25m

I can put a binary file in GIT repo especially small ones and ones that don't change - things that people want are "handling binary files well", whatever that means, but putting big binaries in GIT or a lot of binary files or versioning them is not the use case for GIT.

nolist_policy
0 replies
12h10m

Just put it in your git repo.

Borg3
0 replies
6h37m

Yeah I know about git-annex. It might be good solution for big data. In my case, I do NOT want to decouple storage from metadata. I want single repo for single project that is self-contained. Easier to manage, its truly distributed. No need to bother w/ backups because every replica have everything allready. Its good model for several GBs of data.

nmz
1 replies
16h9m

DVCS stands for distributed version control system, it has nothing to do with source code?

Maybe you're confusing it with SCM which are source control managers, that's the only ones that handle strict source only, but scm can mean other things.

Borg3
0 replies
6h40m

Hard to say.. For me DVCS is more advanced version of DVFS. DVCS can do branching and merging, provides more metadata for revisions etc.. DVFS just do pretty much one thing, store binary blobs. And because binary blobs cannot be easly merged, I would not use it for storage here. But I guess, its just me :)

cryptonector
4 replies
2d2h

You say this, but Git has made great strides in scaling to huge repositories in recent years. You can currently do the "checkout just a subtree of the monorepo" just fine, and you can use shallow clones to approximate a centralized system (and most importantly to use less local storage).

If you had such a system, getting people off git wouldn't be the issue - offer git compatibility and [...]

Git is already doing exactly that.

vlovich123
1 replies
1d

Git itself isn’t though, not in any real way that matters. Having to know all the sub trees to clone in a mono repo is a usability nonstarter. You need a pseudo filesystem that knows how to pull files on access. And one ideally integrated with the build system to offset the cost of doing remote operations on demand and improve parallelism. Facebook is open sourcing a lot of their work but it’s based on mercurial. Microsoft is bought into git but afaik hasn’t open sourced their supporting git tooling that makes this feasible.

TLDR: the problem is more complex and pretending like “you can checkout a subtree” solves the problem is missing the proverbial forest for the (sub)tree

neerajsi
0 replies
22h26m

Microsoft's vfs for git is open source. So is scalar. These are the two main approaches used at Microsoft for large repos. Unfortunately the technically superior vfs approach was a nonstarter on macOS.

nolist_policy
0 replies
22h7m

You can't (since commits are snapshots of the repo root). You can have this approximation however:

    git clone --filter=blob:none --sparse https://github.com/neovim/neovim
    cd neovim
    git sparse-checkout add scripts
Unfortunately, GitHub does not support --filter=sparse:oid=master:scripts, so blobs will be fetched on demand as you use the repo.

jbaber
3 replies
2d19h

It does feel like asking "What will replace ASCII?" Extensions, sure, but 0x41 is going to mean 'A' in 5050 AD.

mmphosis
1 replies
18h27m

UTF-8

account42
0 replies
6h49m

That validates gp's point though: UTF-8 doesn't replace ASCII, it extends it. All valid ASCII text remains valid UTF-8 while retaining the same meaning. With the momentum behind git it will be hard for something incompatible replace it, but an extended git could catch on.

eliangcs
0 replies
2d11h

Author here. I don’t think ASCII is the right comparison. True, it would be really hard for anything to compete with Git because a lot of infrastructures we have are already deeply integrated with Git. But think about x86 vs. ARM and how AI might change our ways of producing code.

langsoul-com
0 replies
11h33m

I really doubt that would happen. Got fails when it reaches Google scale repos. But most of the world isn't using such large repos anyway.

A replacement would be niche, only for the huge orgs, which is usually made by them anyway. For everyone else, git is good enough.

hoistbypetard
18 replies
2d21h

Thanks for sharing a fun read.

Bitkeeper was neat, and my overall take on it mirrors Larry McVoy's: I wish he had open sourced it, made his nut running something just like github but for Bitkeeper, and that it had survived.

I only had one interaction with him. In the early '00s, I had contributed a minor amount of code to TortoiseCVS. (Stuff like improving the installer and adding a way to call a tool that could provide a reasonable display for diffs of `.doc` and `.rtf` files.) I had a new, very niche, piece of hardware that I was excited about and wanted to add support for in the Linux kernel. Having read the terms of his license agreement for Bitkeeper, and intending to maintain my patches for TortoiseCVS, I sent him an email asking if it was OK for me to use Bitkeeper anyway. He told me that it did not look like I was in the business of version control software (I wasn't!) and said to go ahead, but let him know if that changed.

I use git all the time now, because thankfully, it's good enough that I shouldn't spend any of my "innovation tokens" in this domain. But I'd still rather have bitkeeper or mercurial or fossil. I just can't justify the hit that being different would impose on collaboration.

sunshowers
8 replies
2d10h

Like I tell lots of people, check out Jujutsu. It's a very Mercurial-inspired-but-better-than-it UI (the lead dev and I worked on Mercurial together for many years) with Git as one of the main supported backends. I've been using it full time for almost a year now.

JoshTriplett
6 replies
1d

I would love to use jujutsu, and it seems like a great model. I think it'd be a bad outcome if the world starts building top a piece of software with a single company owner and a CLA, though.

I hope that the CLA goes away one day.

sunshowers
5 replies
23h49m

Note that the CLA does not transfer copyright, so "single company owner" is not accurate from a copyright perspective.

JoshTriplett
4 replies
23h23m

It's accurate from the perspective of "there's a single company with the right to change the licensing arbitrarily".

ilyagr
3 replies
19h52m

No, it is not accurate. That is not what Google's CLA says. (Though there are other CLAs out there that are closer to what you describe)

(*Update:* Though IANAL, you should read the child comment and the CLA itself and make up your own mind. https://cla.developers.google.com/about/google-individual. The rest of my comment is mostly independent of the previous paragraph).

OTOH, IANAL, but AFAIK anyone can fork `jj` and sell a proprietary product based on jj (and distribute it under pretty much whatever license they like, with very few restrictions) because it is currently Apache licensed, but that is unrelated to the Google CLA.

Let me conjecture even more wildly about things I don't know. The following is a guess on my part.

One way to interpret this is that Google tends to publish their projects under Apache, and there is no need to demand that people transfer copyright to Google. By releasing your work under Apache, you are already giving Google (or anyone else) all the rights it needs.

AFAIK, the main purpose of the Google Individual CLA is to have you sign a statement claiming that you own the rights to your own work and didn't give up those rights to your employer.

JoshTriplett
2 replies
18h44m

Grant of Copyright License. Subject to the terms and conditions of this Agreement, You hereby grant to Google and to recipients of software distributed by Google a perpetual, worldwide, non-exclusive, no-charge, royalty-free, irrevocable copyright license to reproduce, prepare derivative works of, publicly display, publicly perform, sublicense, and distribute Your Contributions and such derivative works.

That is a substantially more permissive license than Apache-2.0 (let alone other licenses, which apply to works to which Google also applies that CLA). That term means that Google can ignore the terms of Apache-2.0 and instead use the work under this much more permissive license, while everyone else is bound by Apache-2.0. In other words, they can do whatever they want with the code. Others could ship it in a proprietary product, sure, but they can't ignore the terms of the license while doing so.

"Permissive license" doesn't mean "do whatever you want". Apache-2.0, among other things, requires maintaining license notices,

(Note that the "and to recipients of" clause doesn't imply others can ignore the license terms, because they'd still be subject to the terms of whatever license Google puts on the software they distribute, whether that's Apache-2.0 or some proprietary license.)

So I maintain that "there's a single company with the right to change the licensing arbitrarily" is a largely accurate summary/gloss. Or, if you prefer, "there's a single company with the right to ignore the license terms".

ilyagr
1 replies
18h10m

This is a good point, you can indeed argue that "there's a single company with the right to ignore the license terms" is correct. Thank you for elaborating, I added a note to my comment.

I'm still not sure whether it really matters in light of the Apache license, but I don't feel qualified to argue about that.

I guess the straw-man I was arguing against was that some people think you transfer your copyright to Google (you don't), but that's different from what you claimed.

JoshTriplett
0 replies
16h32m

Thank you, I appreciate your followup and edit. Copyright assignment agreements are worse than CLAs, but I'm not claiming that the Google CLA includes a copyright assignment.

It matters less for something like the Apache license than it does for a copyleft license, but there are still reasons people use Apache rather than MIT or public domain, and it does include several protections people care about.

Re your edit:

AFAIK, the main purpose of the Google Individual CLA is to have you sign a statement claiming that you own the rights to your own work and didn't give up those rights to your employer.

The Developer Certificate of Origin (DCO, what you're signing if you use a "Signed-off-by: Your Name <email@example.org>" line) serves this same purpose, isn't a CLA, and doesn't cause any of the same problems. Legal departments generally don't have concerns with developers signing a DCO, while many will rightfully prevent or restrict signing a CLA (even when they were otherwise fine with a developer contributing to Open Source in general).

yencabulator
3 replies
2d2h

I was a heavy user of BitKeeper.

To me, Git is almost exactly like a ground-up cleaner rewrite of BitKeeper. Gitk and git-gui are essentially clones of the BitKeeper GUI.

I don't understand why you'd want to keep using BitKeeper.

cmrdporcupine
1 replies
23h40m

Conceptually git is more powerful. But I recall the bitkeeper CLI being far more sensible in its interface.

yencabulator
0 replies
23h34m

It had its own weird quirks, and sometimes revealed that it was a front for a single file with a lot of funnily-formatted lines. We're just separated from it in time, and you can only truly hate what is familiar.

hoistbypetard
0 replies
1d23h

I think my memory is probably colored by BitKeeper being my first DVCS. I was never a heavy user of it.

I was exposed to BitKeeper when I was managing my team's CVS server. On my next team, we moved to svn, which always felt like cvs with better porcelain from a developer perspective, but when administering that server fell onto my plate, I liked it a lot better than CVS. And I thought BitKeeper would be nicer from a developer perspective.

Then on my next team, we used mercurial. I really, really, really liked mercurial, both as a developer and as a dev infrastructure administrator. It also sucked a lot less on Windows than git or BitKeeper.

The last time I had to decide for a new team, mercurial and git were the obvious options. I went with git because that was clearly what the world liked best, and because bringing new team members up to speed would require less from me that way.

All that goes to say... my direct comparison of git and bitkeeper came from when bitkeeper was mature and git decidedly was not. Then I lumped it in with mercurial (which I really would still prefer, right now) and fossil (ditto). You're probably exactly right about BK.

nmz
2 replies
2d15h

I wouldn't put fossil in that list of collaboration, since its not really a collaborative tool, or more like, there are barriers to that collaboration, like creating a username for each fossil repository. That's a huge barrier in my view. It would be nice if there was something like a general auth identity that can be used everywhere but that's still not implemented.

FWIW, mercurial seems to have an advantage over git, and that support for BIG repositories which seems to be provided by facebook of all people, so until facebook moves to git, mercurial lives on.

phyrex
0 replies
17h21m

Facebook doesn’t really use vanilla mercurial but its own scale-oriented rust fork. It’s open sourced as “sapling”

metadat
17 replies
20h28m

> My biggest regret is not money, it is that Git is such an awful excuse for an SCM. It drives me nuts that the model is a tarball server. Even Linus has admitted to me that it’s a crappy design. It does what he wants, but what he wants is not what the world should want.

Why is this crappy? What would be better?

Edit: @luckydude Thank you for generously responding to the nudge, especially nearly instantly, wow :)

jasoneckert
9 replies
20h14m

As someone who has lived in Git for the past decade, I also fail to see why Git is a crappy design. It's easy to distribute, works well, and there's nothing wrong with a tarball server.

trhway
8 replies
19h39m

Exactly. While the article is good about events history, it doesn't go deep enough into the feature evolution (which is tightly connected to and reflects the evolution of the software development). Which is :

TeamWare - somewhat easy branching (by copying whole workspace from the parent and the bringover/putback of the changes, good merge tool), the history is local, partial commits.

BitKeeper added distributed mode, changesets.

Git added very easy branching, stash, etc.

Any other currently available source control usually is missing at least one of those features. Very illustrative is the case of Mercurial which emerged at about the same time responding to the same need for the modern source control at the time, yet was missing partial commits for example and had much cumbersome branching (like no local history or something like this - i looked at it last more than a decade ago) - that really allowed it to be used only in very strict/stuffy settings, for everybody else it was a non starter.

nmz
7 replies
16h25m

Git is terrible at branching, constantly squashing and rebasing is not a feature but an annoyance. see fossil for how to do proper branching/merging/logging, by its very nature, Not to mention that by having the repository separate from the data, it forces you to organize it in a nice way (Mine look like Project/(repo.fossil, branch1/ branch2/ branch3/) You can achieve this with git now but I never had to think about it in fossil, its a natural consequence of the design.

trhway
6 replies
15h59m

constantly squashing and rebasing is not a feature but an annoyance

it is a feature which allows, for example, to work simultaneously on several releases, patches, hot fixes, etc. Once better alternative emerges we'll jump the git ship as we did before when we jumped onto the git ship.

the repository separate from the data

that was a feature of a bunch of source controls and a reason among others why they lost to git.

it forces you to

that is another reason why source controls lose to git as git isn't forcing some narrow way of doing things upon you.

I don't deny of course that for some people/teams/projects other source controls work better as you comment illustrates. I'm just saying why git won and keeps the majority of situations.

johannes1234321
4 replies
8h33m

Once better alternative emerges we'll jump the git ship as we did before when we jumped onto the git ship.

It's not that easy at this point in time. git carries a lot of momentum, especially in combination with GitHub.

Anybody learning about software development learns about git and GitHub.

Software is expected to be in GitHub.

At the time git became successful there were arguably better systems like mercurial and now we got fossil, but git's shortcomings are too little of a pain point compared to universal knowledge about it and integration into every tool (any editor, any CI system, any package manager, ...) and process.

trhway
3 replies
8h20m

It's not that easy at this point in time. git carries a lot of momentum, especially in combination with GitHub.

CVS back then was like this too, including public repos, etc.

At the time git became successful there were arguably better systems like mercurial

I specifically mentioned Mercurial above because they both emerged pretty simultaneously responding to the same challenges, and Mercurial happened to be just inferior due to its design choices. Companies were jumping onto it too, for example our management back then chose it, and it was a predictable huge pain in the neck, and some years down the road it was replaced with git.

johannes1234321
2 replies
8h5m

CVS back then was like this too, including public repos, etc.

Not really.

CVS had too many flaws (no atomicity, no proper branching, no good offline work, etc.) Subversion as "natural successor" fixed some things and was eating some parts of CVS.

At the same time sourceforge, the GitHub of that time, started to alienate their users.

And then enterprises used different tools to way larger degree (VSS, sccs, Bk, perforce, whatever) while that market basically doesn't exist anymore these days and git is ubiquitous.

And many people went way longer without any version control than today. Today kids learn git fundamentals very early, even on Windows and make it a habit. Where's in the early 2000s I saw many "professional" developers where the only versioning was the ".bak" or ".old" file or copies of the source directory.

swiftcoder
1 replies
4h31m

(VSS, sccs, Bk, perforce, whatever) while that market basically doesn't exist anymore these days and git is ubiquitous.

Perforce still has a solid following in the gamedev space - even with LFS, git's handling of binaries is only mildly less than atrocious.

johannes1234321
0 replies
3h20m

Yeah but market share shrunk a lot (especially since the market grew massively) and even Perforce is a tool integrating with git these days.

nmz
0 replies
13h4m

it is a feature which allows, for example, to work simultaneously on several releases, patches, hot fixes, etc. Once better alternative emerges we'll jump the git ship as we did before when we jumped onto the git ship.

What are you talking about here? I'm not talking about eliminating branching, but the fact that merging a branch is usually just a fake single commit that hides away the complexity and decisions of the branch. see [0] into how you can leverage branches and the log for a sane commit history.

that was a feature of a bunch of source controls and a reason among others why they lost to git.

Given the article, git won because it was foss, torvalds and speed, if you have proof of a good amount of people saying "I hate the division of data and repository!" then its a believable claim, or maybe you're confusing the data/repo division with cvs? git also didn't have to fight much, the only contender was hg

[0]: https://fossil-scm.org/home/timeline

luckydude
6 replies
19h31m

My issues with Git

- No rename support, it guesses

- no weave. Without going into a lot of detail, suppose someone adds N bytes on a branch and then that branch is merged. The N bytes are copied into the merge node (yeah, I know, git looks for that and dedups it but that is a slow bandaid on the problem).

- annotations are wrong, if I added the N bytes on the branch and you merged it, it will (unless this is somehow fixed now) show you as the author of the N bytes in the merge node.

- only one graph for the whole repository. This causes multiple problems: A) the GCA is the repository GCA, it can be miles away from the file GCA if there was a graph per file like BitKeeper has. B) Debugging is upside down, you start at the changeset and drill down. In BitKeeper, because there is a graph per file, let's say I had an assert() pop. You run bk revtool on that file, find the assert and look around to see what has changed before that assert. Hover over a line, it will show you the commit comments to the file and then the changeset. You find the likely line, double click on it, now you are looking at the changeset. We were a tiny company, we never hit the claimed 25 people, and we supported tons of users. This form of debugging was a huge, HUGE, part of why we could support so many people. C) commit comments are per changeset, not per file. We had a graphic check in tool that walked you through the list of files, showed you the diffs for that file and asked you to comment. When you got the the ChangeSet file, now it is asking you for what Git asks for comments but the diffs are all the file names followed by what you just wrote. It made people sort of uplevel their commit comments. We had big customers that insisted the engineers use that tool rather a command line that checked in everything with the same comment.

- submodules turned Git into CVS. Maybe that's been redone but the last time I looked at it, you couldn't do sideways pulls if you had submodules. BK got this MUCH closer to correct, the repository produced identical results to a mono repository if all the modules were present (and identical less whatever isn't populated in the sparse case). All with exactly the same semantics, same functionality mono or many repos.

- Performance. Git gets really slow in large repositories, we put a ton of work into that in BitKeeper and we were orders of magnitude faster for things like annotate.

In summary, Git isn't really a version control system and Linus has admitted it to me years ago. A version control system needs to faithfully record everything that happened, no more or less. Git doesn't record renames, it passes content across branches by value, not by reference. To me, it feels like a giant step backwards.

Here's another thing. We made a bk fast-export and a bk fast-import that are compatible with Git. You can have a tree in BK, have it updated constantly, and no matter where in the history you run bk fast-export, you will get the same repository. Our fast-export is idempotent. Git can't do that, it doesn't send the rename info because it doesn't record that. That means we have to make it up when doing a bk fast-import which means Git -> BK is not idempotent.

I don't expect to convince anyone of anything at this point, someone nudged, I tried. I don't read hackernews any more so don't expect me to defend what I said, I really don't care at this point. I'm happier away from tech, I just go fish on the ocean and don't think about this stuff.

tempodox
1 replies
4h8m

What's a GCA?

gwd
0 replies
9h51m

You run bk revtool on that file, find the assert and look around to see what has changed before that assert. Hover over a line, it will show you the commit comments to the file and then the changeset. You find the likely line, double click on it, now you are looking at the changeset.

I still have fond memories of the bk revool. I haven't found anything since that's been as intuitive and useful.

anitil
0 replies
12h56m

I hadn't heard of the per-file graph concept, and I can see how that would be really useful. But I have to agree that going for a fish sounds marvellous.

account42
0 replies
7h17m

No rename support, it guesses

Git doesn't track changes yes, it tracks states. It has tools to compare those states but doesn't mean that it needs to track additional data to help those tools.

I'm unconvinced that tracking renames is really helpful as that is only the simplest case of of many possible state modifications. What if you split a file A into files B and C? You'd need to be able to track that too. Same for merging one file into another. And many many many more possible modifications. It makes sense to instead focus on the states and then improve the tools to compare them.

Tracking all kinds of changes also requires all development tools to be aware of your version control. You can no longer use standard tools to do mass renames and instead somehow build them on top of your vcs so it can track the operations. That's a huge tradeoff that tracking repository states doesn't have.

submodules

I agree, neither submodules nor subtrees are ideal solutions.

JoshTriplett
10 replies
1d

Tridge did the following.

“Here’s a BitKeeper address, bk://thunk.org:5000. Let’s try connecting with telnet.”

Famously, Tridge gave a talk about this, and got the audience of the talk to recreate the "reverse engineering". See https://lwn.net/Articles/133016/ for a source.

I attended Tridge's talk today. The best part of the demonstration was that he asked the audience for each command he should type in. And the audience instantly called out each command in unison, ("telnet", "help", "echo clone | nc").
luckydude
7 replies
17h39m

This is completely untrue. There is no way that you could make a BK clone by telneting to a BK and running commands. Those commands don't tell you the network protocol, they show you the results of that protocol but show zero insight into the protocol.

Tridge neglected to tell people that he was snooping the network while Linus was running BK commands when Linus was visiting in his house. THAT is how he did the clone.

The fact that you all believe Tridge is disappointing, you should be better than that.

The fact that Tridge lied is disappointing but I've learned that open source people are willing to ignore morals if it gets them what they want. I love open source, don't love the ethics. It's not just Tridge.

JoshTriplett
3 replies
16h36m

There is no way that you could make a BK clone by telneting to a BK and running commands. Those commands don't tell you the network protocol

The network protocol, according to multiple sources and the presented talk at LCA, was "send text to the port that's visible in the URL, get text back". The data received was SCCS, which was an understood format with existing tools. And the tool Tridge wrote, sourcepuller, didn't clone all of BitKeeper, it cloned enough to fetch sources, which meant "connect, send command, get back SCCS".

Anything more than that is hearsay that's entirely inconsistent with the demonstrated evidence. Do you have any references supporting either that the protocol was more complicated than he demonstrated on stage at LCA, or that Tridge committed the network surveillance you're claiming?

And to be clear, beyond that, there's absolutely nothing immoral with more extensively reverse-engineering a proprietary tool to write a compatible Open Source equivalent. (If, as you claim, he also logged a friend's network traffic without their express knowledge and consent, that is problematic, but again, the necessity of doing that seems completely inconsistent with the evidence from many sources. If that did happen, I would be mildly disappointed in that alone, but would still appreciate the net resulting contribution to the world.)

I appreciate that you were incensed by Tridge's work at the time, and may well still be now, but that doesn't make it wrong. Those of us who don't use proprietary software appreciate the net increase in available capabilities, just like we appreciate the ability to interoperate with SMB using Samba no matter how inconvenient that was for Microsoft.

AceJohnny2
1 replies
16h11m

Have you tried it?

the one you're replying to, @luckydude, is Larry McVoy, who created BitKeeper.

JoshTriplett
0 replies
16h6m

Fascinating, I was unaware of that link (and don't systematically check people's HN profiles before replying). Thank you for the reference; I've edited my comment to take that into account.

rbsmith
0 replies
12h2m

I worked on bk

The data received was SCCS, which was an understood format with existing tools.

You'd be surprised. SCCS is not broadly understood. And BK is not exactly SCCS.

I read the SourcePuller code when it was published (sp-01). It's pretty easy reading. I give Tridge credit for that. I wrote a little test, got it to checkout the wrong data with no errors reported. Issue was still there in sp-02 .

mpe
0 replies
7h0m

This post is BS. You should delete it.

drewdevault
0 replies
8h7m

Come on, man, you should be better than this. With so many years of hindsight surely you realize by now that reverse engineering is not some moral failing? How much intellectual and cultural wealth is attributable to it? And with Google v. Oracle we've finally settled even in the eyes of the law that the externally visible APIs and behavior of an implementation are not considered intellectual property.

Tridge reverse engineering bk and kicking off a series of events that led to git is probably one of the most positively impactful things anyone has done for the software industry, ever. He does not deserve the flack he got for it, either then or today. I'm grateful to him, as we all should be. I know that it stings for you, but I hope that with all of this hindsight you're someday able to integrate the experience and move on with a positive view of this history -- because even though it didn't play out the way you would have liked, your own impact on this story is ultimately very positive and meaningful and you should take pride in it without demeaning others.

account42
0 replies
7h6m

If anything here was immoral it was locking other people's data in a proprietary tool and then denying them the ability to export it to open formats.

lathiat
1 replies
16h31m

I was there for that talk, good times. Lot's of great linux.conf.au talks from Tridge over the years.

slyall
0 replies
6h56m

Same. I definitely remember the "help" line from it too.

mulmboy
4 replies
2d20h

Additionally, Petr set up the first project homepage for Git, git.or.cz, and a code hosting service, repo.or.cz. These websites were the “official” Git sites until GitHub took over.

Is this true? I thought GitHub had no official affiliation with the git project

arp242
1 replies
2d20h

That's why "official" in in quotes. As in: "de-facto standard".

cxr
0 replies
2d18h

Not really. git-scm.org is the de facto "official" site for the Git project in about the same way that French is the de facto "official" language of France.

They meant exactly what they wrote: GitHub took over hosting duties for the official Git site (because they did).

roywashere
0 replies
2d20h

The git repo is on kernel.org nowadays with mirrors on repo.or.cz and GitHub.

But I think they mean here what the official git project ‘site’ is with docs and so on. And that is now https://git-scm.com/ and indeed as the article describes that was initially set up by GitHub people, to promote git

jimbobthrowawy
0 replies
2d20h

I think some github employees have written code that went into git, but it's not an official affiliation.

The quotes on "official" imply non-official to me. i.e. official seeming to people who don't know any better.

cxr
4 replies
2d19h

There's a screenshot purporting to be of GitHub from May 2008. There are tell-tale signs, though, that some or all of the CSS has failed to load, and that that's not really what the site would have looked like if you visited it at the time. Indeed, if you check github.com in the Wayback Machine, you can see that its earliest crawl was May 2008, and it failed to capture the external style sheet, which results in a 404 when you try to load that copy today. Probably best to just not include a screenshot when that happens.

(Although it's especially silly in this case, since accessing that copy[1] in the Wayback Machine reveals that the GitHub website included screenshots of itself that look nothing like the screenshot in this article.)

1. <https://web.archive.org/web/20080514210148/http://github.com...>

eliangcs
2 replies
2d17h

Author here. That's a good catch, thanks! I've replaced it with a newer screenshot from August 2008.

cxr
1 replies
1d18h

Larry wants to call you and discuss two corrections to this piece ("one minor, one major"). I've already passed on your email address for good measure, but you should reach out to him.

eliangcs
0 replies
1d13h

I've emailed him to follow up. Thanks for letting me know!

philipwhiuk
0 replies
2d18h

Thanks - I was struggling to believe GitHub would have launched with something as bad looking - 2008 was not CERN era looking webpages!

janvdberg
3 replies
21h41m

Exceptional read! I love it.

It's the most complete history of git that I know now. Exceptional!

I'd love to read more historical articles like this one, of pieces of software that have helped shape our world.

noufalibrahim
0 replies
16h25m

+! to that. Great read. The field is young and accelerating. History is quite compressed. It's valuable to have articles like this.

eliasson
0 replies
21h33m

Ditto. This was a really nice read!

deskr
0 replies
19h23m

It's the most complete history of git that I know now.

I wasn't going to read the story until I read your comment. I knew the summary of BitKeeper and the fallout, but wow this was so detailed. Thanks!

globular-toast
2 replies
2d8h

I've heard the story before but this was still fun to read. I didn't realise quite how rudimentary the first versions of git were. It really makes you wonder: was git the last opportunity to establish a ubiquitous version control system? Will there ever be another opportunity? Regardless of git's technical merits, one thing I'm extremely happy about is that it's free software. It seemed to come just before an avalanche of free software and really changed the way things are done (hopefully for good).

shagie
0 replies
17h46m

Two of the key features that were part of of early git that show much git was about support Linux kernel development:

https://git-scm.com/docs/git-am

https://git-scm.com/docs/git-send-email

Git was built around supporting the linux kernel email lists. And while there are a number of other options out there that sprang up around the same time, many of them didn't fill the core need for git at that time - to reduce the stress / workload on Linus.

ozim
0 replies
20h9m

It created the avalanche. I don’t think scale of free software we have now would be possible without git and GitHub.

zerocrates
0 replies
20h47m

That part really should just be a straight quotation; the very light rewording of the original post feels in poor form.

aidenn0
0 replies
19h36m

The entire comment section on that post is a goldmine, thanks!

xiwenc
1 replies
2d20h

It’s been awhile since i actually finished reading an article this long. Very well written!

I tried to find out who the author is or how come he/she knows so much. No luck. Anyone else knows or OP care to chip in?

superfish
1 replies
2d20h

Great read!

I’m sure I’m not the first to point out that Junio (the appointed git “shepherd”) works at Google where mercurial is the “recommend local vcs” internally instead of git.

ilyagr
0 replies
23h58m

Large parts of Google rely on Git, most notably Chrome and Android.

Also, it is a good thing if Junio can do his job independently of Google's immediate needs.

sergius
1 replies
2d5h

This story is missing the impact that Tom Lord's TLA had on the git design.

nyanpasu64
1 replies
2d9h

FYI Mercurial's developer is now known as Olivia Mackall; sadly the Google infobox has failed to pick up the updated information.

eliangcs
0 replies
1d12h

Updated, thanks.

tretiy3
0 replies
13h25m

I have no experience with c and i wonder: why Linus decided that implementing merging should go with scripting language and not in c?

throw7
0 replies
19h20m

Fun read.

The licensing of bitkeeper was a real thing. Although I don't follow the kernel mailing list at all nowadays, I remember Alan Cox calling it out as buttkeeper. Good Times.

rob74
0 replies
8h54m

The bk clone/pull/push commands functioned similarly to git clone/pull/push.

That sounds a bit backwards: actually Git works similar to BitKeeper (can't say to what extent, as I'm not familiar with bk), not the other way around.

mindjiver
0 replies
9h5m

This really took me back. Back then before Git was a big thing (2010/2011-ish) I had the misfortune to work at a very large user of IBM Rational ClearCase and it was so awful. However it was so bad and so expensive that I managed to get tasked to "fix it". As part as figuring out how to do this I travelled to GitTogether 2011 from Sweden. Lots of Git folks from those days where there, at least I remember Junio, Peff and Shawn Pearce being there. I was so energised from it all I went back and formed a small team that migrated a colossal code base (oh the horror stories I have) over to Git over the next 2 years. The most rewarding thing I did early in my career.

So thank to all of you that made this possible by creating Git, Gerrit and all the life saving tools this industry was missing! The passing of Shawn Pearce was really sad, but he won't be forgotten!

michaelcampbell
0 replies
3h51m

re: licensing

You couldn’t use BitKeeper for version control if you were working on version control software.

You had to get BitMover’s permission if you wanted to run BitKeeper alongside other similar software.

That just strains credulity.

lawgimenez
0 replies
16h19m

A heavily sedated sloth with no legs is probably faster

I'm going to borrow this phrase from now on to everything slow.

hgo
0 replies
9h35m

This is why I come to HN. Thank you to the author.

dudus
0 replies
2d16h

I never heard the term porcelain before, but I liked this tidbit.

"In software development terminology, comparing low-level infrastructure to plumbing is hard to trace, but the use of “porcelain” to describe high-level packaging originated in the Git mailing list. To this day, Git uses the terms “plumbing” and “porcelain” to refer to low-level and high-level commands, respectively. "

Also, unrelated, the "Ruby people, strange people" video gave me a good chuckle.

https://www.youtube.com/watch?v=0m4hlWx7oRk&t=1080s

devdao
0 replies
3h43m

Requesting permission from your source control tool vendor to be able to continue your work is nonsense.

It's alive today! Sr.ht has categories of work you can't host too. Still marinating.

ajkjk
0 replies
21h22m

Dang this is such a good read.

account42
0 replies
8h3m

Thanks Andrew Tridgell for not letting the kernel get stuck with a proprietary source control. An example how sticking to your principles can make the world better in the long run even if it annoys people at first.

JoshTriplett
0 replies
1d

In January 2006, the X Window team switched from CVS to Git, which wowed Junio. He didn’t expect such a big project like X Window to go through the trouble of changing version control systems.

It's the "X Window System" or just "X".