return to table of content

Anyone can access deleted and private repository data on GitHub

andersa
99 replies
23h26m

I reported this on their HackerOne many years ago (2018 it seems) and they said it was working as intended. Conclusion: don't use private forks. Copy the repository instead.

Here is their full response from back then:

Thanks for the submission! We have reviewed your report and validated your findings. After internally assessing the finding we have determined it is a known low risk issue. We may make this functionality more strict in the future, but don't have anything to announce now. As a result, this is not eligible for reward under the Bug Bounty program.

GitHub stores the parent repository along with forks in a "repository network". It is a known behavior that objects from one network member are readable via other network members. Blobs and commits are stored together, while refs are stored separately for each fork. This shared storage model is what allows for pull requests between members of the same network. When a repository's visibility changes (Eg. public->private) we remove it from the network to prevent private commits/blobs from being readable via another network member.
liendolucas
64 replies
23h2m

Honest question. Submitting these types of bugs only to get a: "we have determined it is known low risk issue..." seems like they really don't want to pay for someone else's time and dedication in making their product safer. If they knew about this, was this disclosed somewhere? If not I don't see them playing a fair game. What's the motivation to do this if in the end they can have the final decision to award you or not? To me it looks like similar to what happens with Google Play/Apple store to decide whether or not an app can be uploaded/distributed through them.

Edit: I popped this up because to me is absolutely miserable from a big company to just say: "Thanks, but we were aware of this".

jonahx
34 replies
22h52m

Not defending GH here (their position is indefensible imo) but, as the article notes, they document these behaviors clearly and publicly:

https://docs.github.com/en/pull-requests/collaborating-with-...

I don't think they're being underhanded exactly... they're just making a terrible decision. Quoting from the article:

The average user views the separation of private and public repositories as a security boundary, and understandably believes that any data located in a private repository cannot be accessed by public users. Unfortunately, as we documented above, that is not always true. Whatsmore, the act of deletion implies the destruction of data. As we saw above, deleting a repository or fork does not mean your commit data is actually deleted.
andersa
23 replies
22h38m

Based on some (admittedly not very thorough) search, this documentation was posted in 2021, three years after my report.

YetAnotherNick
22 replies
21h48m

But that would still means they didn't intend to fix it, hence not giving bounty is fair.

malfist
21 replies
21h27m

It's a bug bounty, not a "only if we have time to fix it" bounty.

He found a security problem, they decided not to act on it, but it was still an acknowledged security problem

madeofpalk
10 replies
21h8m

The point of a bug bounty is for companies to find new security problems.

If the (class of) problem is already known, it’s not worth rewarding.

berdario
6 replies
20h52m

I can see this argument making a bit of sense, but if they documented this 3 years after the issue was reported, they don't have a way to demonstrate that they truly already knew.

At the end it boils down to: is Github being honest and fair in answering the bug bounty reports?

If you think it is, cool.

If you don't, maybe it's not worth playing ball with Github's bug bounty process

tptacek
5 replies
20h45m

It doesn't matter if they knew. If they don't deem it a security vulnerability --- and they have put their money where their mouth is, by documenting it as part of the platform behavior --- it's not eligible for a payout. It can be a bug, but if it's not the kind of bug the bounty program is designed to address, it's not getting paid out. The incentives you create by paying for every random non-vulnerability are really bad.

The subtext of this thread is that companies should reward any research that turns up surprising or user-hostile behavior in products. It's good to want things. But that is not the point of a security bug bounty.

cycomanic
1 replies
19h47m

I would argue that even if the behaviour was as intended, at least the fact that it was not documented was a bug (and a pretty serious one at that).

tptacek
0 replies
19h10m

Again: you don't generally get bounties for finding "bugs"; you get them exclusively for finding qualified vulnerabilities.

berdario
1 replies
6h28m

That's true, but what's stopping a company from documenting a security issue as a known (mis)behaviour/bug? [*]

Companies can join/setup a bug bounty program, and just use it as a fig leaf for pretending to care about their own product/service's security.

Of course bug bounties can and are abused daily by people who report trivial non-issues in the hope of compensation

But in the same way, companies can also be bad actors in the way that they engage with bounties. I would usually expect big names (like Google, Apple, Github, etc.) to be trustworthy...

[*] Of course what stops companies is precisely them not being seen as trustworthy actors in the bug bounty system anymore... And for now, that's a decision that individuals have to make themselves

tptacek
0 replies
4h24m

No large company cares even a tiny bit about the money they're spending on bug bounties. They would literally lose money trying to cheat, because it would cost them more in labor to argue with people than to pay out. In reality, the bounty teams at Google and Apple are incentivized to maximize payouts, not minimize them.

If you don't trust the company running a bounty, don't participate. There are more lucrative ways to put vulnerability research skill to use.

lolinder
0 replies
19h3m

The incentives you create by paying for every random non-vulnerability are really bad.

So much this. It's pretty clear that most people commenting on this thread have never been involved in a bug bounty program on the company's side.

Bug bounty programs get a lot of reports, most of which are frankly useless and many of which are cases of intended behavior subjectively perceived as problematic. Sifting through that mess is a lot of work, and if you regularly pay out on unhelpful reports you end up with many more unhelpful reports.

This particular case definitely feels like one where the intended behavior is horribly broken, but there are absolutely many cases where "this is intended" is the only valid answer to a report.

andrewinardeer
2 replies
20h50m

If a renown company won't pay a bug bounty, a foreign government often will.

prepend
0 replies
19h56m

Good luck selling this to a foreign (or domestic) government. It doesn’t seem valuable to me, but who knows, maybe someone finds it worth payout.

madeofpalk
0 replies
20h19m

Why would a foreign government pay for a commonly known security limitation of a product?

coldtea
6 replies
20h26m

It's a bug bounty, not a "only if we have time to fix it" bounty

It's only a bug if it's not intended

ethbr1
4 replies
17h57m

Do some companies intend for their platform to feature remote code execution?

jen20
1 replies
16h57m

Remote code execution is literally a feature of GitHub…

ethbr1
0 replies
15h1m

Sandboxed code execution is a bit different than RCE.

coldtea
1 replies
7h3m

Some might very well do. E.g. a company with a service for training hackers and security researchers.

In this case the question is moot, as this doesn't involve remote code execution.

ethbr1
0 replies
30m

Make a general point, get a general answer.

If the criteria for bug is "intended", and that's solely judged by the company, then broken auth et al. suddenly become part of their product design.

If it quacks like a bug, it's a bug.

bmitc
0 replies
12h22m

I think a lot of developers and companies interpret "that's the way the code or process works" as intentional behavior, which is not always the case.

K0balt
2 replies
9h32m

The property (“bug”) in question is an inherent and intentional property of meekly-tree type storage systems such as git.

Calling this a bug is like reporting that telnet sends information unencrypted.

The actual bug is in the way that their UX paradigm sets user expectations.

yencabulator
0 replies
2h19m

Don't blame Git for Github decisions.

Github chooses to store all "Github forks" in the same repository, and allow accessing things in that repository even when they are not reachable by the refs in the namespace of one "fork". That is purely a Github decision.

oasisaimlessly
0 replies
4h4m

s/meekly/Merkle/g

jiggawatts
8 replies
19h11m

From the article:

"We surveyed a few (literally 3) commonly-forked public repositories from a large AI company and easily found 40 valid API keys from deleted forks."

This is how your customers get their entire cloud taken over, because you made a stupid, stupid decision and instead of fixing it when warned (repeatedly!) you instead decide to just blame the customer for not reading page 537 paragraph 3 subsection B about the counter-intuitive security footgun you've left in your product.

This is negligence, pure and simple.

ndriscoll
7 replies
17h22m

If you published a key, you must assume someone copied it and that deleting references to it is not sufficient. You must rotate that key now, and should check whether it was used improperly. This is pretty basic incident response.

The thing about exposing commits that were only ever in a private repo is pretty indefensible, but not garbage collecting public commits on delete shouldn't matter.

jiggawatts
6 replies
15h53m

If you published a key

Why would anyone think that a private fork is "published"!?

This is the footgun here: The UI is telling you that nobody can see the secrets you committed to your private copy, but actually it is widely accessible.

A similar example of UI-vs-reality mismatch that I've noticed recently is the Azure Store Account "public" visibility. By default, it uses your authenticated account for RBAC access checks, so if you click around it'll say something like "you don't have browse access". This looks secure, but attempting access anonymously would have succeeded!

I had a customer recently where this happened -- they clicked through every Storage Account to "check" them, convinced themselves they were secure, meanwhile they had database backups with PII accessible to world+dog!

cellis
2 replies
15h2m

Putting keys in repos should not be done, full stop. Even if GitHub forks weren’t public, their _private_ repos could one day be compromised. Instead, store keys in a shared vault, .gitignore the .env and have a .env.example with empty keys.

jiggawatts
0 replies
12h13m

Any time I hear “shouldn’t be done” I translate that to “will happen regularly”.

I do see this regularly in my work. All but one dev team I’ve worked with over the last few years has done this.

Eisenstein
0 replies
4h45m

Don't blame the end user for doing something you don't want them to do if it is more convenient to do and works without immediate consequences. Redesign it or rethink your assumptions.

ndriscoll
1 replies
15h0m

The bit you quoted is referring to public forks that were deleted. That sounds like a non-issue to me, and I'm not at all surprised that

1. Public "forks" are just namespaced branches that share an underlying repo

2. They don't run the garbage collector all the time

I'd be surprised if those weren't true.

Like I said, the behavior with private forks sounds indefensible.

The OP is mixing together multiple things. Being able to access deleted public data isn't that surprising and definitely isn't a security issue as far as leaking keys is concerned (it was already public. Assume it has been cloned). Being able to access private forks is a footgun/issue. They should be garbage collecting as part of public repo creation so that unreferenced commits from private forks aren't included.

andersa
0 replies
8h30m

As far as I can tell, they never run the garbage collector. Code I pushed to a fork that was deleted several years ago can still be accessed through the original parent repo.

prmoustache
0 replies
11h20m

Why would anyone think that a private fork is "published"!?

Anyone who put sensitive content in a git repo should consider published anyway. Git is a decentralized tool, as a company you cannot control the amount of git remotes that may host your code. Considering your code is only hosted as a private repo in a specific remote git server is at best naive. This is without even considering the amount of copies that are stored in dev computers.

Besides, anyone who put stuff on a third party publicly accessible infrastructure should consider it published anyway as breaches happen all the time.

If you happen to have api keys stored in a git repo, the only viable response is rotating those keys.

jowea
0 replies
21h8m

Shouldn't that be on the config page for the repo below the "private" button with a note saying private is not actually private if it's a fork? And ditto for delete?

kayodelycaon
13 replies
22h55m

As the article pointed out, GitHub already publicly documented this vulnerability.

My employer doesn't pay out for known security issues, especially if we have mitigating controls.

A lot of people spam us with vulnerability reports from security tools we already use. At least half of them turn out to be false positives we are already aware of. In my opinion, running a bug bounty program at all is a net negative for us. We aren't large enough to get the attention of anyone competent.

giobox
8 replies
20h59m

As the article pointed out, GitHub already publicly documented this vulnerability.

I'm honestly not yet convinced that is enough here - I've fallen victim to this without realizing it - the behaviour here is so far removed from how I suspect most user's mental model of github.com works. For me none of the exposed data is sensitive, but the point remains I was totally unawares it would be retrievable like this.

If the behaviour flies so against the grain, just publishing it in a help doc is not enough I'd argue. The linked article makes the exact same argument:

"The average user views the separation of private and public repositories as a security boundary, and understandably believes that any data located in a private repository cannot be accessed by public users. Unfortunately, as we documented above, that is not always true. Whatsmore, the act of deletion implies the destruction of data. As we saw above, deleting a repository or fork does not mean your commit data is actually deleted."
tptacek
6 replies
20h42m

The problem with this line of argument is that the fundamental workings of git are also surprising to people, such that they routinely attempt to address mistaken hazmat commits by simple reverts. If at bottom this whole story is just that git is treacherous, well, yeah, but not news.

There's a deeper problem here, which is that making the UX on hosting sites less surprising doesn't fix the underlying problem. There is a best-practices response to commiting hazmat to a repository: revoke the hazmat, so that its disclosure no long matters. You have to do this anyways. If you can't, you should be in contact with Github directly to remove it.

Cpoll
4 replies
20h9m

Is "git" relevant here? Forking isn't a git concept, and none of this behaviour has much to do with git; it's all GitHub.

Also, you can revoke an API key, but you can't revoke a company-proprietary algorithm that you implemented into a fork of a public project.

tptacek
1 replies
20h4m

Like I said: if you can't revoke the thing you committed, you need to get in touch with Github and have them remove it. That's a thing they do.

Cpoll
0 replies
2h56m

Sure, but the whole point of the article is that people don't know their "private" forks aren't private. You can't get in touch with GitHub if you've never had any indication that anything's wrong.

The solution for that is better UX.

schrodinger
1 replies
13h55m

aside: I think it's questionable to say that forking isn't a git concept. it's just a branch on a different upstream. Those two upstreams could simply be two different folders on your machine, or shared server.

I supposed the branding and UI for it could be a counter argument, but then again Github allows regular branch creation / committing / merging in their UI. Their main value add (not downplaying it—it's huge) on top of git (besides ancillary things like CI / linters) is the ability to comment on a branch's diff, i.e. a PR Review.

giobox
0 replies
2h52m

There's an entire custom UX flow for forking on GH that is not part of git at all. I think its very fair here to discuss "fork" in the specific sense Github uses it, as its what has lead to some of the issues discussed. There are absolutely means of providing fork functionality that don't have some of the problems we are discussing, but that's not how GH chose to build it.

Aeolun
0 replies
19h3m

Yeah, and we can blame a lot of that on the Git developers, but they never use words like ‘public’ and ‘private’ to indicate things they’re not.

Regardless, the vulnerability in Github forks falls squarely on Github, and is not mitigated by Git being hard to understand in the first place.

gowld
0 replies
17h55m

Two thinks can be true (and are)

1. GitHub has a nasty privacy/security hole, where commonsense expectations about the meanings of common words are violated by the system.

2. Github has publicly announced that they don't care about this part of user data security (private code), so won't pay people to know tell them what they alreay know and announced.

Github won't pay you to tell them they are wrong when everyone alreay knows.

oxfordmale
2 replies
18h57m

As the author pointed out, the documentation was written three years after he reported it.

Beyond that is is also a batshit crazy implementation. Just I imagine AWS would still allow AWS credentials to give access to a deleted account

account42
1 replies
8h52m

The expectations for AWS and public repository hosting are not the same. If you leaked something to a public GitHub repo you should assume that it has been cloned the second you pushed it.

oxfordmale
0 replies
7h23m

This is about access to private repos, not public ones:

"Anyone can access deleted and private repository data on GitHub"

ipaddr
0 replies
21h2m

For both sides it turns into a net negative. Better to keep your bugs and use them when needed or sell them to others to use if possible.

Lets get back to what we had before when multiple people can find the same bug and exploit if needed. Now we have the one person who finds the bug it gets patched and they don't get paid.

tptacek
2 replies
20h49m

No large company running a bug bounty cares one iota about stiffing you on a bounty payment. The teams running this programs are internally incentivized to maximize payouts; the payouts are evidence that the system is working. If you're denied a payment --- for a large company, at least --- there's something else going on.

The thing to keep in mind is that large-scale bug bounty programs make their own incentive weather. People game the hell out of them. If you ack and fix sev:info bugs, people submit lots more sev:info bugs, and now your security program has been reoriented around the dumbest bugs --- the opposite of what you want a bounty program to do.

raesene9
1 replies
8h24m

In my (admittedly limited) experience, whilst payouts for bugs might be seen as a positive internally, payments for bad architecture/configuration choices are less so (perhaps as they're difficult to fix, so it's politically not expedient to raise them internally).

To provide one example I reported to a large cloud provider that their managed Kubernetes system exposed the Insecure port to the container network, meaning that anyone with access to one container automatically got cluster-admin rights. That pretty clearly seems like not a good security choice, but probably hard to fix if they were relying on that behaviour (which I'm guessing they were).

Their response was to say it was a "best practice" behaviour (no bounty applicable) and that they'd look to fix and asked me not to publicly mention it. Then they deprecated the entire product 6 months later :D

That's one example but I've seen similar behaviour multiple times for things that are more architecture choices than direct bugs, which makes me think reporting such things isn't always welcome by the program owners.

tptacek
0 replies
4h21m

Repeating myself: this almost certainly has nothing at all to do with the money they'd have to give you (I assure you, if there's even a whiff of legitimacy to your report, the people managing the bounty would probably strongly prefer to pay you just to get you off their backs) and everything to do with the warped incentives of paying out stuff like this. People forget that the whole point of a bug bounty is that the rewarded bugs get fixed; the bounty is directing engineering effort. If it directs them to expensive work they already made a strategic decision not to do, the bounty is working against them.

You would prefer this company to have made a different strategic choice about what to spend engineering time on, and that's fine. But engineering cycles are finite, so whatever time they'd spend configuring K8s differently is time they wouldn't be spending on some other security goal, which, for all we know, was more important. Software is fathomlessly awful, after all.

cyrnel
2 replies
22h10m

Security disclosures are like giving someone an unsolicited gift. The receiver is obligated to return the favor.

But if you buy someone non-refundable tickets to a concert they already have tickets for, you aren't owed compensation.

account42
0 replies
8h49m

Security disclosures are like giving someone an unsolicited gift.

Exactly.

The receiver is obligated to return the favor.

Not at all. This is a very toxic expectation.

TheDong
0 replies
18h56m

Security disclosures are like telling someone they have a spot on their face. It's not always welcome, and there's no obligation on anyone to do so, nor anyone to return the favor.

In this case, the spot turned out to be a freckle, which everyone involved already knew was a freckle (since it was documented), and if anyone owes anyone anything, it's the researcher that owes github for wasting their time.

account42
2 replies
9h8m

Disagree. This is obviously a deliberate design choice with obvious implications. Expecting a bounty for reporting this is unreasonable. These kind of beg bounties are exactly what gives security "researchers" a bad name.

The security implications are also minor. The only problem really is with making a fork of a private repo public - that should only make what exists in that fork public and not any other objects. Something that was already public staying public even when you delete it from your repo is not a security issue at all. Keys you have ever been pushed to a public repo should be revoked no matter what, with or without this GitGub feature.

barco
0 replies
4h31m

I reported a variant of this issue that (to me) was unexpected:

* You add someone to your private repo.

* After some time, you revoke their access.

As long as they keep a fork (which you can't control) they can use this same method to access new commits on the repo and commits from other private forks.

Back in 2018, this was a resolved as won't fix, but it also wasn't documented.

andersa
0 replies
8h40m

I wasn't really expecting a bounty, more so hoping they'd fix the issue. For example, to this day I keep having to tell people to never fork the Unreal Engine repository, instead making a manual copy, just in case.

This causes lots of problems for repositories that are private with the expectation that companies will make private forks with their own private changes.

Someone once pushed a bunch of console SDKs (under strict NDA) to a private fork without knowing this. Now that code is just there, if you can guess the commit hash, forever. Literally nothing can be done to remove it. Great.

93po
1 replies
22h50m

companies vary wildly in their honesty and cooperation with bug bounties and develop reputations as a result. if they have a shit reputation, people stop doing free work for them and instead focus on more honest companies

account42
0 replies
8h45m

Not all free work is wanted. Discouraging frivolous reports is exactly what is being accomplished by not paying for them.

whoknew1122
0 replies
4h57m

It's not just GitHub and it's not just because they don't want to pay bug hunters. In my career, I have escalated multiple bugs to my employer(s) in which the response was 'working as intended'. And they wouldn't have to pay me another cent if they acknowledged the issue.

In my experience, there was two reasons for this behavior:

1. They don't want to spin dev cycles on something that isn't directly related to revenue (e.g. security) 2. Developers don't have the same mindset as someone who's whole job is security. So they think something is fine when it's really not.

nyrikki
0 replies
21h17m

For moral reasons, historically I never wrote POCs or threatened disclosure.

For companies like Microsoft, which a CSRB audit showed that their security culture 'inadequate', the risk of disclosure with a POC is about the only tool we have to enforce their side of the Shared Responsibility Model.

Even the largest IT spender in the world, the US government has moved more from the carrot to the stick model. If they have to do it so do we.

Unfortunately as publishing a 'bad practices' list by us doesn't invoke the risk of EULA busting gross negligence claims, responsible disclosure is one of the few tools we have.

hluska
0 replies
20h45m

The issue had been reported at least twice and was clearly documented. GitHub knew about this and had known for years. Their replies to the two notifications were even very similar.

GitHub clearly knew. Would you prefer that a vendor lie?

andersa
0 replies
22h15m

I didn't find anything mentioning it online at the time. But there wasn't much time and dedication involved either, to be fair. I discovered it completely on accident when I combined a commit hash from my local client with the wrong repository url and it ended up working.

SnowflakeOnIce
19 replies
20h25m

There seems to be no such thing as a "private fork" on GitHub in 2024 [1]:

A fork is a new repository that shares code and visibility settings with the upstream repository. All forks of public repositories are public. You cannot change the visibility of a fork.

[1] https://docs.github.com/en/pull-requests/collaborating-with-...

ff7c11
8 replies
17h40m

A fork of a private repo is private. When you make the original repo public, the fork is still a private repo, but the commits can now be accessed by hash.

CGamesPlay
7 replies
17h25m

According to the screenshot in the documentation, though, new commits made to the fork will not be accessible by hash. So private feature branches in forks may be accessible via the upstream that was changed to public, if those branches existed at the time the upstream's visibility changed, but new feature branches made after that time won't be accessible.

pcthrowaway
6 replies
15h53m

OK but say a company has a private, closed source internal tool, and they want to open-source some part of it. They fork it and start working on cleaning up the history to make it publishable.

After some changes which include deleting sensitive information and proprietary code, and squashing all the history to one commit, they change the repo to public.

According to this article, any commit on either repo which was made before the 2nd repo was made public, can still be accessed on the public repo.

reisse
3 replies
7h53m

After some changes which include deleting sensitive information and proprietary code, and squashing all the history to one commit, they change the repo to public.

I know this might look like a valid approach on the first glance but... it is stupid for anyone who knows how git or GitHub API works? Remote (GitHub's) reflog is not GC'd immediately, you can try to get commit hashes from events history via API, and then try to get commits from reflog.

Perseids
1 replies
3h26m

it is stupid for anyone who knows how git or GitHub API works?

You need to know how git works and GitHub's API. I would say I have a pretty good understanding about how (local) git works internally, but was deeply surprised about GitHub's brute-forceable short commit IDs and the existence of a public log of all reflog activity [1].

When the article said "You might think you’re protected by needing to know the commit hash. You’re not. The hash is discoverable. More on that later." I was not able to deduce what would come later. Meanwhile, data access by hash seemed like a non-issue to me – how would you compute the hash without having the data in the first place? Checking that a certain file exists in a private branch might be an information disclosure, but gi not usually problematic.

And in any case, GitHub has grown so far away from its roots as a simple git hoster that implicit expectations change as well. If I self-host my git repository, my mental model is very close to git internals. If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights. I mean, GitHub organizations are not a git concept.

[1] https://www.gharchive.org/

reisse
0 replies
4m

You need to know how git works and GitHub's API.

No; just knowing how git works is enough to understand that force-pushing squashed commits or removing branches on remote will not necessarily remove the actual data on remote.

GitHub API (or just using the web UI) only makes these features more obvious. For example, you can find and check commit referenced in MR comments even if it was force-pushed away.

was deeply surprised about GitHub's brute-forceable short commit IDs

Short commit IDs are not GitHub feature, they are git feature.

If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights.

Have you ever tried to make private GitHub repository public? There is a clear warning that code, logs and activity history will become public. Maybe they should include additional clause about forks there.

marcosdumay
0 replies
4h29m

Yes, even though I expect there to be people that do exactly what the GP describes, if you know git it has severe "do not do that!" vibes.

Do not squash your commits and make the repository public. Instead, make a new repository and add the code there.

sickblastoise
0 replies
6h33m

Why not just create a new public repo and copy all of the source code that you want to it?

Log_out_
0 replies
1h13m

Chat gpt given the following repo, create a plausible perfect commit history to create this repository.

Manuel_D
5 replies
20h9m

Not through the GitHub interface, no. But you can copy all files in a repository and create a new repository. IIRC there's a way to retain the history via this process as well.

shkkmo
0 replies
19h52m

All you should have to do is just clone the repo locally and then create a blank GitHub repository, set it as the/a remote and push to it.

mckn1ght
0 replies
19h54m

You can create a private repository on GitHub, clone it locally, add the repo being "forked" from as a separate git remote (I usually call this one "upstream" and my "fork", well, "fork"), fetch and pull from upstream, then push to fork.

make3
0 replies
19h58m

That's not the GitHub concept / almost trademark of "fork" anymore though, which is what your parent was talking about

a1o
0 replies
19h57m

I mean it's git, just git init, git remote add for origin and upstream, origin pointing to your private, git fetch upstream, git push to origin.

JyB
0 replies
19h36m

That’s beside the point. The article is specifically about « GitHub forks » and their shortcomings. It’s unrelated to pushing to distinct repositories not magically ´linked’ by the GH « fork feature ».

rkagerer
2 replies
13h20m

Am I the only one who finds this conceptually confusing?

rocqua
1 replies
11h56m

Nope, me too. The whole Repo network thing is not User facing at all. It is an internal thing at GitHub to allow easier pull requests between repo's. But it isn't a concept git knows, and it doesn't affect GitHub users at all except for this one weird thing.

brazzledazzle
0 replies
9h36m

I may be recalling incorrectly but I seem to remember it having some storage deduplication benefits on the backend.

kayodelycaon
4 replies
22h57m

What does "private fork" mean in this context? I created a fork of a project by cloning it to my own machine and set origin to an empty private repository on GitHub. I manually merge upstream changes on my machine.

Is my repository accessible?

swozey
0 replies
21h49m

Because you never git pushed to the fork it's not aware of your repo, you're ok.

What I don't know is if in 3 months you DO set your remote origin to that fork to for instance, pull upstream patches into your private repo, you're still not pushing, only pulling, so I would THINK they'd still never get your changes, but I don't know if git does some sort of log sync when you do a pull as well.

Maybe that would wind up having the commit hash available.

masklinn
0 replies
21h48m

It’s not. The feature here works because a network of forks known by GitHub has a unified storage, that’s what makes things like PRs work transparently and keep working if you delete the fork (kinda, it closes the PR but the contents don’t change).

dathinab
0 replies
21h23m

then it's fine

the issue is the `fork` mechanism of github is not semantically like a `git clone`

it's more like creating a larger git repo in which all forks weather private or not are contained and which doesn't properly implement access management (at least point 2&3 wouldn't be an issue if they did)

there are also some implications form point 1 that forks do in some way infer with gc-ing orphan commits (e.g. the non synced commits in he deleted repo in point 1) at least that should be a bug IMHO one which also costs them storage

(also to be clear for me 2&3 are security vulnerabilities no matter if they are classified as intended behavior)

andersa
0 replies
22h48m

No, that would be the "copy the repository" approach. Private fork is when you do it through their UI.

As far as I know, it is not accessible.

tedivm
2 replies
23h19m

I reported a different security issue to github, and they responded the same (although they ultimately ended up fixing it when I told them I was going to blog about the "intended behavior").

myfonj
0 replies
22h26m

What "intended behaviour" was that, specifically?

_heimdall
0 replies
18h37m

Did you end up getting a bug bounty out of it?

jeremyjh
1 replies
22h51m

It would not even be that hard to fix it; private forks should always just be automatically copied on first write. You might lose your little link to the original repo, but that's not as bad as unintentionally exposing all your future content.

sundalia
0 replies
22h29m

Yup, we can close the thread and ack that GitHub does not care.

fullstackchris
1 replies
20h36m

To be fair, in the true git sense, if a "fork" is really just a branch, deleting the original completely would also mean deleting every branch (fork) completely

obviously not a fan of this policy though

bogota
0 replies
12h7m

But a fork is really not a branch. it’s a copy of a repo with one remote pointing at the original on github but that doesn’t need to happen.

WA
0 replies
13h41m

Conclusion: don't use private forks. Copy the repository instead.

My conclusion would be: don’t use GitHub.

HenryBemis
0 replies
3h37m

Imho there is an issue with the word "delete". Apparently for anyone who is hosting someone else's (private and/or sensitive and/or worthy) data is to hide it from view, but keep it around "just in case" or "because we can" or "what are you gonna do about it"?

I 'love' it when I see the words "hide", "archive", "remove", and other newspeak to avoid using the word "delete", since 'they' never actually delete (plus there are 1-2-5-10-forever years' of backups where your 'deleted' info can be retrieved relatively easy).

hackerbirds
50 replies
22h44m

Users should never be expected to know these gotchas for a feature called "private", documented or not. It's disappointing to see GitHub calling it a feature instead of a bug, to me it just shows a complete lack of care about security. Privacy features should _always_ have a strict, safe default.

In the meantime I'll be calling "private" repos "unlisted", seems more appropriate

layer8
22 replies
22h16m

I'll be calling "private" repos "unlisted"

The same for “deleted” repos.

NullPrefix
21 replies
19h37m

"deleted" is just a fancy word "inaccessible to the user"

callalex
18 replies
19h23m

No, it really isn’t. Anyone who uses that word that way is just factually incorrect, and probably pretty irresponsible depending on the context. Software should not tell lies.

dumbo-octopus
17 replies
19h8m

delete: remove or obliterate (written or printed matter), especially by drawing a line through it or marking it with a delete sign

Which is, indeed, what every modern database does.

8organicbits
14 replies
17h33m

I think you are referring to tombstoning. That's usually a temporary process that may immediately delete the underlying data, keeping a tombstone to ensure the deletion propagates to all storage nodes. A compaction process purges the underlying data (if still present) and the tombstones after a suitable delay. It's a fancy delete that takes some time to process, but the data is eventually gone. You could turn off the compaction, if you wanted.

I believe Kafka make deletion difficult, since it's an append-only log, but Kafka doesn't work well with laws that require deletion of data, so I don't believe it's a popular choice any longer (I.E. isn't modern).

dumbo-octopus
13 replies
12h36m

If you run a DELETE FROM in any modern sql engine, which is the absolute best you could expect when asking for a delete in the UI^, the data is nowhere near gone. It’s still in all the backups, all the WALs, all the transactions that started before yours, etc. It’s marked for eventual removal, and that’s it. Just as the definition of delete I provided says.

^ (more likely they’ll just update the table to set a deleted flag)

8organicbits
6 replies
6h0m

eventual removal

To me, the idea that the deletion takes time to complete doesn't negate the idea that the data will be gone once the process completes.

WAL archive and backups are external systems. You could argue that nothing supports deletion because an external backup could exist, but that's not a useful conversation.

dumbo-octopus
5 replies
5h33m

Going back to the point of the the thread, we agree the deleted data is not erased. The user is unable to access it through normal mechanisms, but the existence of side channels that could reveal it does not negate the idea that it has truly been “deleted”, especially when one looks at the historical context surrounding that word.

8organicbits
4 replies
5h1m

What? I don't agree with that.

Can you point to an example of a modern database that "supports deletion" but keeps the data around forever? Maybe I've just used different tools than you. Knowing modern data retention concerns I'd be surprised if such a thing existed.

dumbo-octopus
3 replies
4h16m

Who said anything about that? We’re talking about side channels and eventual^TM deletion. Given enough time no information will remain anywhere, sure. But that’s not very relevant.

8organicbits
2 replies
3h50m

I think we are trying to define the word "delete". You found an archaic definition and are trying to use it in a modern technical setting. You've claimed that modern databases delete without actually removing data but haven't pointed to which systems you are talking about. I'm familiar with tombstoning, either as a "soft-delete" or as part of an eventual deletion process. But I've never seen that called deletion as that would be very confusing.

Pointing to which database you are talking about should clear this up quickly.

I don't think it's reasonable to talk about backups here. A backup is external to the database so it inherently cannot delete it. Similar to how a piece of paper cannot destroy a photograph of the paper, but burning the paper destroys it.

dumbo-octopus
1 replies
3h41m

I used the first definition of delete I found which, while arguably “archaic”, matches the modern technical term almost exactly. We’d typically call that a well known word with a clear meaning.

And sure, the DELETE FROM statement in postgres - or any other standards compliment sql db I know.

8organicbits
0 replies
3h17m

In technical writing you often don't want to use the dictionary for definitions, similar to how words in a contract can have unexpected meaning in a legal setting.

For Postgres you've got to consider vacuum. Auto vacuum is enabled by default. Deleted rows are removed unless you go out of your way to make it do something different.

UweSchmidt
5 replies
10h3m

Imagine the data that was deleted is of the highest level of illegality you can imagine. Under no circumstance can your service be associated with that content.

- What was your "definition of delete" again?

- You mentioned some of the convenient technical defaults your frameworks and tools provide out-of-the-box, can you think of ways to improve the situation?

(You might re-run delete requests after restoring a backup; transaction should resolve in a timely fashion, failed deletes can be communicated to the user quickly etc.)

dumbo-octopus
4 replies
5h40m

We are missing the point here. The GP was claiming that delete meant something other than adding a mark to an item that you want to eventually be removed from the system. It doesn’t.

UweSchmidt
3 replies
5h5m

I understand that you describe the status quo in many systems today.

However, besides the technical aspect you talked about the "absolute best you could expect when asking for a delete in the UI^".

I think this where I, other posters in the thread, most people, and probably the GDPR and other legislature, would disagree. We expect significantly more effort to clean up deleted data.

This includes, for example, the ability to delete datasets from backups, as well as a general accountability of how often and where all the data is stored and if, and when a deletion process is complete.

dumbo-octopus
2 replies
4h20m

GDPR and other legislature

Nope. GDPR allows deleted data to be retained in backups so long as there is an expiration process in place. Doesn’t matter how long it is. But certainly nobody has a right to forcing a company to pull all of their backups from cold storage and trove through them all any time any deletion request takes place. That’d be the quickest path to Distributed Denial of Bank Account Funds imaginable. Even the GDPR isn’t that bone-headed.

But yes, it is part of the law that the provider should tell you that your data isn’t actually being erased and instead it will be kept around until they get around to erasing everything as part of their standard timelines. But that knowledge doesn’t do anyone much good.

CNIL confirmed that you’ll have one month to answer to a removal request, and that you don’t need to delete a backup set in order to remove an individual from it.

https://blog.quantum.com/2018/01/26/backup-administrators-th...

hunter2_
0 replies
4h10m

But GitHub is keeping this stuff indefinitely. No long expiration, no probability of eventual disk overwriting, nothing. All they're doing is shutting the front door without shutting the side door.

UweSchmidt
0 replies
2h35m

Interesting point about the GDPR; I will soften my point to mean that lawmakers have started (late) to regulate data retention / deletion and the rights of users in general and that might be a trend for the future.

However I would like to avoid the impression that with the description of the technical status quo the topic is settled. To do so I would go back to my previous point: Imagine some truly illegal pictures are in that cold storage backup, and one day you might have to restore that data. (Since aparently the user's wish to delete data is not quite as respected as certain other hard legal requirements regarding content)

What solutions to mitigate the situation could a company, or backup tool/web framework etc. reasonably come up with? Maybe check the restored data against a list of hashes/IDs of to-be-deleted-data?

mdavidn
0 replies
15h41m

Every modern file system works like this too. Then there’s copy-on-write snapshotting and SSD wear leveling to worry about. Data isn’t actually destroyed until the space is reused to store something else at an indeterminate point in the future.

Or when its encryption key is overwritten.

But it probably is a good idea to stop returning deleted data from web APIs.

cottsak
0 replies
15h34m

this is why when I'm building confirm UI, I prefer the term "destroy?" on the confirm action. It's much clearer to the user that this is a destructive and irreversible action and we will be removing this data/state.

*obviously doesn't apply to soft deletes.

stubish
0 replies
11h12m

No, deleted is a word for deleted. But we started saying things were "deleted", while our eyes flicked to the stack of backup tapes in the corner, acknowledging the white lie, because really deleting things conflicted with other priorities and was hard. And we left it there, until privacy regulations came along and it turned out not using the normal definition of deleted could get you sued. So IMO Github is wide open to paying damages to the first person able to demonstrate them.

Dylan16807
0 replies
12h28m

It's tolerated for there to be temporary inaccessible copies sticking around when something is deleted.

What GitHub is doing here is neither temporary nor inaccessible.

chrisandchris
14 replies
22h38m

Yep, I see GitHub as "public only" hosting, and if I want to host something private, I will choose another vendor.

stvltvs
6 replies
22h33m

Which vendors work best for private projects?

tracker1
0 replies
22h24m

You could consider GitLab.. though this only seems to affect private forks of public repos.

the8thbit
0 replies
20h8m

I've used both Bitbucket and Azure in the corporate world.

t-writescode
0 replies
13h8m

I've been happy with Jetbrains Space (now Space Code); but I'm using it for private, professional work and paying for it, so perhaps that isn't what you mean.

prmoustache
0 replies
11h14m

gitea works well. Use that on your own network.

chrisandchris
0 replies
8h5m

JetBrains Space, Atlassian Bitbucket, GitLab (also On-Premises), Gitea

Order does not indicate any preference.

Ragnarork
0 replies
10h7m

Sourcehut :)

OutOfHere
3 replies
22h30m

The noted issue looks to be applicable to forks only, not to all private repos.

eslaught
0 replies
21h24m

It also applies to this situation:

    1. Create a private repo R
    2. Create a private fork F of R
    3. Push commits to the fork F
    4. Make R public
The commits pushed to F prior to R being made public will become de facto public, even though F has always been a private fork. The post makes clear that commits pushed to F after R is made public are placed into a separate, private fork network.

So basically, if you ever intend to open source anything, never do it to an existing private repo. Always start a from-scratch repo to be the root of your new public project.

chrisandchris
0 replies
7h59m

I find the attitude worrying. I understand that it's maybe not easy to fix, or even fixable without breaking some use cases.

However, if they "don't care" about such an issue, how can I trust them to care about other stuff?

EugeneOZ
0 replies
11h31m

Github’s attitude and perception of the terms “privacy” and “security” - it is more important.

dheera
1 replies
19h46m

Or commit an ecryptfs.

Clone and mount, unmount and commit

1oooqooq
0 replies
10h31m

extremely annoying, but only true private option on somebody's else computer.

i read headlines like the above with the implied "not just to the employees there anymore"

account42
0 replies
8h32m

if I want to host something private, I will choose another vendor.

Or you know, self-host, preferrably on-prem.

Basic git hosting only needs a sshd running on the server. If you want collaborative features with a web UI then there are solutions for that available too.

est
4 replies
17h26m

It's disappointing to see GitHub calling it a feature instead of a bug

git is a "distributed" version control software afterall. It means a peer can't control everything.

Osiris
3 replies
16h41m

Anyone at your company and just push to a public git repository at any time. Nothing stopping them except threat of consequences.

account42
2 replies
8h27m

So? Employees with access to sensitive data are capable of leaking that data. News at eleven!

And anyone in the world can pull what was pushed to a public git repo before you delete it. You should always assume that has happened.

oxfordmale
1 replies
7h14m

This is about access to private repos, not public ones:

"Anyone can access deleted and private repository data on GitHub"

account42
0 replies
6h38m

You might have noticed that my comment is a reply to another comment.

barnabee
3 replies
6h48m

Disagree. If you're using a service, understand how it works.

Not everything needs to be designed for idiots and lazy people, it's ok for some tools and services, especially those aimed at technical people and engineers to require reading to use properly and to be surprising or unintuitive at first glance.

niam
2 replies
6h2m

There's got to be a word for these kinds of ridiculous arguments which use personal responsibility as a cudgel against a systematic fix.

I agree generally that interfaces have been dumbing down too far, but "private is actually not private and it's on you for not knowing that, idiot B)" is a weird place to be planting that flag.

barnabee
1 replies
5h17m

There should probably also be a word for the belief that when a system doesn't work how you want it to, that is so obviously a systematic problem that needs fixing rather than, for example, evidence of differing goals or priorities that it is reasonable to describe anyone who thinks otherwise as ridiculous.

phito
0 replies
4h21m

That means having an opinion

epolanski
0 replies
8h48m

I see your point, on the other hand, the standard procedure for that on GitHub UI is to create a repo and then select another as a template.

That doesn't fork, but does what you would expect, a fully private repo.

catalypso
0 replies
18h59m

I'll be calling "private" repos "unlisted"

That might be a bit too strict. I'd still expect my private repos (no forks involved) to be private, unless we discover another footnote in GH's docs in a few years ¯\_(ツ)_/¯

But I'll forget about using forks except for publicly contributing to public repos.

Users should never be expected to know these gotchas for a feature called "private".

Yes, the principle of least astonishment[0] should apply to security as well.

[0] https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...

CGamesPlay
0 replies
17h20m

Specifically about the feature called "private", the only gotcha seems to be that when the upstream transitions from private to public, it may unexpectedly take more data public than desired, right? The other discussed gotchas were all about deleting public data not actually being deleted or inaccessible.

jonahx
17 replies
23h8m

Surprised at the comments minimizing this.

I've used github for a long time, would not have expected these results, and was unnerved by them.

I'd recommend reading the article yourself. It does a good job explaining the vulnerabilities.

hyperpape
12 replies
22h55m

For the first two, git is based on content addressable storage, so it makes sense that anything that is ever public will never disappear.

I can sympathize with someone who gets bit by it, as it might not have occurred to them, but it’s part of the model.

The third strikes me as counter-intuitive and hard to reason about.

P.S. If you publish your keys or access tokens for well known services to GitHub and you are prominent enough, they will be found and exploited in minutes. The idea that deleting the repository is a security measure is not really worth taking seriously.

jonahx
4 replies
22h47m

I agree the 3rd is by far the worst of the offenders. But even the first two should have more visibility. For example, by notifying users during deletion of forked repos that data will still be available.

The exact UX here is debatable, but I don't think security warnings buried in the docs is enough. They should be accounting for likely misunderstandings of the model.

hyperpape
3 replies
21h11m

Even if it wasn't forked, it could be cloned. Should that be part of the warning?

I wouldn't mind a disclaimer when you delete a repository that any information that repository ever contained is likely to have already been downloaded and stored. Per the comment I added, I'm not sure it would really help that much, but it would not be harmful.

jonahx
2 replies
20h33m

Should that be part of the warning?

It couldn't hurt, but that isn't the misunderstanding I'm worried about.

As described in the first example of the article, you can make a fork, commit to it, delete your entire fork, and yet the data will still be accessible via the parent repo, even though no one ever forked or cloned or saw your fork. That is not intuitive at all.

You can say "Well just consider any data that has ever been public compromised forever", and indeed you should, but this behavior is still surprising and could bite devs even if they know they should follow the advice in that quote.

Consider a situation like this...

Dev forks, accidentally pushes a secret or some proprietary code in a commit, and immediately deletes the fork. They figure it was only up for a very short time, now it's gone, risk someone saw it is low. They don't bother rotating, because that would be a major operational pain (and yes, it shouldn't be, but for many orgs it is).

Is this dev making a mistake? Of course. That's not good security thinking. But their assessment of the risk being low might actually be correct if their very reasonable mental model of deletion were correct. But the unintuitive way GH works means that the actual risk is much higher than their reasoning led them to believe.

prepend
0 replies
19h32m

It couldn't hurt, but that isn't the misunderstanding I'm worried about.

I think lots of warnings lead to people ignoring the warnings. So it could hurt by making people less aware of other warnings.

hyperpape
0 replies
20h7m

As described in the first example of the article, you can make a fork, commit to it, delete your entire fork, and yet the data will still be accessible via the parent repo, even though no one ever forked or cloned or saw your fork. That is not intuitive at all.

But isn't that only the third vulnerability, that private forks are implicitly made public?

As I said, I won't defend that decision.

dogleash
2 replies
22h11m

git is based on content addressable storage, so it makes sense that anything that is every public will never disappear.

No. That doesn't make sense. It only sounds vaguely plausible at first because content addressable storage often means a distributed system where hosting nodes are controlled by multiple parties. That's not the case here, we're only talking about one host.

Imagine we were talking about a (hypothetical) NetFlix CDN where it's content addressed rather than by UUID. Would anyone say "they forgot to check auth tokens for Frozen for one day, therefore it makes sense that everyone can watch it for free forever"?

hyperpape
1 replies
21h28m

Since Netflix neither allows anonymous users to fully download Frozen without DRM, nor allows authorized users to upload derivative works that are then redistributed to the public, I think there may be some relevant differences here.

debugnik
0 replies
20h20m

They do remove content when their licence expires, though. So imagine instead Netflix allowing users to find and watch expired series by hash, then telling the copyright owners they can't fully delete the series because something something content-addressing.

dathinab
2 replies
21h43m

For the first two, git is based on content addressable storage, so it makes sense that anything that is every public will never disappear.

this isn't quite right

content addressable storage is just a mean of access it does

- not imply content cannot be deleted

- not imply content cannot be access managed

you could apply this to a git repo itself (like making some branches private and some not) but more important forks are not git ops, they are more high level github ops and could very well have appropriate measurements to make sure this cannot happen

e.g. if github had implemented forks like a `git clone` _non of this vulnerabilities would have been a thing_

similar implemented different access rights for different subsets of fork networks (or even the same git repo) technically isn't a problem either (not trivial but quite doable)

and I mean commits made to private repositories being public is always a security vulnerability no matter how much github claims it's intended

hyperpape
1 replies
21h31m

You're right that I shouldn't have given the impression that content addressed storage means as a technical matter that public content must never disappear. The phrasing was a bit sloppy. GitHub could, as a technical matter, choose to hide content that had previously been made public.

Nonetheless, given that GitHub exists to facilitate both anonymously pulling the entire history of the repository, and given that any forks would contain the full contents of that repository, it is very natural that GitHub would take the "once public always public" line.

and I mean commits made to private repositories being public is always a security vulnerability no matter how much github claims it's intended

I specifically said the third use case was different, because it is the one that doesn't involve you explicitly choosing to publish the commits that contain your private information. I did not and would not defend GitHub on that point.

Aeolun
0 replies
18h57m

it is very natural that GitHub would take the "once public always public" line

I don’t think that follows at all. Purging hashes without a link to a commit/repository would be pretty natural.

keybored
0 replies
21h25m

For the first two, git is based on content addressable storage, so it makes sense that anything that is ever public will never disappear.

No one can, with a straight face, say that they don’t restrict access because “this is just how the technology works”. Doesn’t matter if it is content addressable or an append-only FS or whatever else.

Even for some technology where the data lives forever somewhere (it doesn’t according to Git; GitHub has a system which keeps non-transitively referenced commits from being garbage collected), the non-crazy thing is to put access policy logic behind the raw storage fetch.

TheDong
1 replies
18h57m

I've used github for a long time, would not have expected these results, and was unnerved by them.

So you've used it heavily, but haven't read the docs or thought about how forks work, and are now surprised. This seems like a learning opportunity, read the docs for stuff you use heavily, read the man pages and info pages for tools you rely on.

None of this seemed surprising to me, perhaps because I've made PRs, seen that PRs from deleted repositories are still visible, and generally have this mental model of "a repository fork is part of a network of forks, which is a shared collection of git objects".

hnbad
0 replies
10h1m

Congratulations, you developed the right intuition.

However in UX/DX the question isn't whether users can develop the right intuition based on how they interact with software over time and reading through the documentation but how to shorten the time and effort necessary for that, ideally so that a single glance is enough.

Do you think reading all the documentation for every feature of every tool you use in your life is a good use of your time and something that should be expected of everyone? As someone developing software used by other people, I don't.

localfirst
0 replies
15h7m

pretty much this weird seeing all the ppl trying to deflect/minimize this as a non issue

bogota
0 replies
12h4m

The mental gymnastics going on in this thread to justify this as a sane design is likely why software sucks more and more these days.

thih9
9 replies
21h11m

Can this be used to host illegal content? I.e.: fork a popular repo, commit a pirated book to the fork, delete the fork, use the original repo to access the pirated book?

What would github do after receiving a DMCA request in that case?

lnrd
1 replies
20h32m

That looks like the kind of loophole that could get GH to do something about this.

arccy
0 replies
19h49m

they have the ability to do essentially git gc and drop unreachable commits

er4hn
1 replies
21h3m

One can safely assume they will find a way to follow the law rather than mumble about technically this is working as intended.

InsomniacL
0 replies
9h40m

One can safely assume

With something as nuance as this, I wouldn't safely assume all processes, especially one from a compliance (none-technical) department account for it.

remram
0 replies
19h1m

It can be used to make it look like another project posted the content (though there is a warning: "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.").

You can't host anything this way that you can't already host in your own repository, and GitHub does have a way to remove content that will make it inaccessible, whether in your repository or through another.

nikeee
0 replies
19h29m

I think something like this was done when the takedown of yt-dl happened

majorchord
0 replies
18h59m

Can this be used to host illegal content?

It already is. Even to github org's own repos. Any time you make a PR, the /tree/ link to it stays valid forever, even if the repo author removes it.

Arnavion
0 replies
19h7m

I've seen bots make that kind of PR spam a few times. They'll make a PR that adds a random HTML or markdown file or whatever containing gambling spam or whatever and then presumably post links to github.com/$yourorg/$yourrepo/blob/$sha/thatfile I can't link an example because all the ones I know about were nuked by GH Support.

londons_explore
9 replies
23h11m

This isn't a bug IMO.

If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

If you already have the data, there is no vulnerability - since you cannot learn anything you don't already have.

If you got the hash from someone, you could likewise have gotten the data from them.

People do need to be aware that 'some random hex string' in fact is the irrevocable key to all the data behind that hash - but that's kinda inherent to gits design. Just like I don't tell everyone here on HN my login password - the password itself isn't sensitive, but both of us know it accesses other things that are.

If github itself was leaking the hash of deleted data, or my plaintext password, then that would be a vulnerability.

qual
4 replies
23h5m

If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

From the article, you do not need to have the data nor learn the hash from someone who had the data.

Commit hashes can be brute forced through GitHub’s UI, particularly because the git protocol permits the use of short SHA-1 values when referencing a commit. A short SHA-1 value is the minimum number of characters required to avoid a collision with another commit hash, with an absolute minimum of 4. The keyspace of all 4 character SHA-1 values is 65,536
londons_explore
3 replies
23h2m

In which case, yeah, thats a vulnerability. They shouldn't allow a short hash to match up against anything but public data.

gus_massa
2 replies
22h13m

It's common to use short hash in pull request, and then modify or rebase the commits.

The solutions are:

* Force people to use the full hash.

* Get use to a lot of dead links.

* Claim that it's a feature, not a bug.

guipsp
0 replies
20h43m

* Force people to use the full hash for commits pushed now on?

Dylan16807
0 replies
12h23m

* Check visibility at the time of posting.

refulgentis
0 replies
23h2m

Read TFA.

jkaptur
0 replies
23h7m

That's counterintuitive, though - often, the whole point of a hash is that it's one-way.

haneul
0 replies
23h5m

If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

Don’t think so - the article mentions you can use the short prefix on GitHub, so you have a search space of 65536.

Aurornis
0 replies
23h4m

If you know the hash of some data, then you either already have the data yourself, or you learned the hash from someone who had the data.

You need to read to the end of the article where they show the brute-force way of getting the hashes.

keybored
9 replies
21h19m

People are so preoccupied with putting the code on GitHub. It’s like it doesn’t exist before it’s on GitHub.

If you’re not gonna share it then it hardly matters. Use a backup drive.

Git is distributed. You don’t have to put your dotfiles on GitHub. Local is enough.

JohnMakin
8 replies
21h15m

Your laptop breaks in a way that your disk cannot be recovered. Now what? How often are you backing up your disk? Probably much easier to type "git commit" and "git push"

keybored
6 replies
20h49m

Am I really gonna get interrogated on HN for talking about automatic and redundant backup give me a break.

JohnMakin
5 replies
20h46m

I wouldn't call the parent comment you're responding to an "interrogation" and I'm sorry you perceived it that way. You make a pretty extraordinary claim that local disk is better than a remote repository for storing/updating code for personal work - with no evidence to support this claim - so a followup question seems reasonable.

as far as "git is distributed" I don't know if that's the case if you keep it purely local, but hey, you seem to have it all figured out so good job.

keybored
4 replies
20h36m

I thought a person of your background (who no doubt has it all figured out) would surmise that I was talking about backing up to an external disk and not to another disk on the same laptop. And would grant another person some good faith and be able to generalize without spelling it all out for them: if the point is to back things up then maybe I can infer that other means of backup are also in the cards, like sneakernet or your own server or multiple locations. Huh

You can also back up to a remote. That is not GitHub. You know because the topic is GitHub and how promiscuous they are. Which is why I say: if you don’t need your code to be “social” you don’t need to put it on GitHub.

But even a remote repository is overkill. An automated backup plan with git bundle is automatic, after all. Set it and forget. And backups are supposed to be automated, right? I ask because you have the relevant background here.

hughesjj
2 replies
19h23m

GitHub, for better or worse, has been one of the easiest ways to backup configuration for ..decades now. It's more secure than sending an email to yourself, Google drive still doesn't have an official linux client, AWS is too enterprisey for a handful of small backup files, and git is incredibly easy to set up + available on so many computers.

I completely get why people would want to use GitHub for a low friction way to store versioned configuration data. It's a natural case for programmers to use the tool they're already using for something else. There's even repos for dotfiles saying stuff like 'hey fork this and make it private' because they know people want to manage dotfiles but might lazily leak some secrets in their own versions

keybored
0 replies
8h27m

I completely get why people would want to use GitHub for a low friction way to store versioned configuration data.

Store the remote backup you mean? Because the versioned configuration data is of course just Git.

dwaite
0 replies
15h58m

I don't know if I would say it is easy as much as I'd say it is automated. I manage configuration changes to some hardware using git, and do manual backup. However, someone else came out with a script that will automate periodic commits to a GitHub account, and automates the setup.

I have a linux distribution which gives the option to allow login via a set of GitHub usernames, and will enable so by downloading each account's public SSH keys.

I don't use either of these, I don't think the second is even a good idea, but can get why its popularity and price has caused deeper integration into products. Other network backup services or login infrastructure does not have the same level of ubiquitous API nor a relevant free tier.

JohnMakin
0 replies
20h30m

I thought a person of your background (who no doubt has it all figured out) would surmise that I was talking about backing up to an external disk and not to another disk on the same laptop. And would grant another person some good faith and be able to generalize without spelling it all out for them: if the point is to back things up then maybe I can infer that other means of backup are also in the cards, like sneakernet or your own server or multiple locations. Huh

Your snark not withstanding, I actually did understand that an external disk resides outside of the laptop, and find your claim still fantastic and lacking evidence.

As for the rest of your post, you'll forgive my misunderstanding of whatever deeply nuanced point you're making here regarding backing up to a remote because of this at the end of your original post:

local is enough.

Anyway, seems like you need to take a break. Someone of my background has better things to do than engage in a flame war with someone clearly looking for a fight over a throwaway post.

mkl
0 replies
18h53m

It's much easier to use an automated backup system/service than to manually run commands.

hmottestad
8 replies
23h14m

The biggest gotcha here is probably that if you start of with a private repo and a private fork, making the repo public also makes the fork "public".

GitHub may very well say that this is working as intended, but if it truly is then you should be forced to make both the repo and fork public at the same time.

Essentially "Making repo R public will make the following forks public as well 'My Fork', 'Super secret fork', 'Fork that I deleted because it contained the password to my neighbours wifi :P'.

OK. I'm not sure if the last one would actually be public, but I wouldn't be surprised if that was "Working as intended(TM)" - GitHub SecOps

pants2
4 replies
21h36m

Any time you make a private repo public it’s best to just copy that code into a new public repo and leave the private repo private. Otherwise have to audit every previous commit and every commit on every fork of your private code.

umpalumpaaa
2 replies
20h42m

If I understand the issue correctly if you make the original repo public any private forks from other users are also effectively public. Right?

scarface_74
0 replies
15h35m

You create a new repository, “git init” it and copy your files over to it and push your new repository to your open source repository.

hmottestad
0 replies
9h12m

Seems to be the case yes. And I guess that the authors of those repos will not get a notification of any sort.

IshKebab
0 replies
11h16m

Yeah that's fine but the issue is GitHub doesn't make it clear that you need to do this.

kemitche
1 replies
20h37m

I agree. The other cases may be mildly surprising, but ultimately fall firmly into the category of "once public on the internet, always public." Deleting a repo or fork or commit doesn't revoke an access key that was accidentally committed, and an access key being public for even a microsecond should be assumed to have been scraped and usable by a malicious actor.

hmottestad
0 replies
7h53m

If you have a private repo, you would assume that nothing in that private repo becomes public unless you do something very explicit.

The issue here is that if you have a private repo and a private fork of that repo. If you make the private repo public and keep the fork private, then you are not explicitly told that your fork is actually public, whether you want to or not.

Already__Taken
0 replies
3h12m

it's a bit of that you have to know the sha, and that's quite unique. it's apparently unique enough for Google photos to "private" share without logins

majorchord
7 replies
19h3m

No but I think attention should still be raised to it in the hopes they will fix it. The squeaky wheel gets the grease.

https://xkcd.com/1053

dwaite
2 replies
16h34m

First step would be to have them acknowledge a documented behavior which was part of their original design 16 years ago, is something that needs to be fixed.

As someone who has used git and GitHub extensively over that time, none of what the author documented was a surprise to me.

However, I also remember when people were trained to do a "Save As" when preparing a final Word document or Powerpoint for sharing with a third party. That certainly bit enough business users that Microsoft eventually changed the default behavior.

HeatrayEnjoyer
1 replies
13h21m

What about Save As bit people?

Dylan16807
0 replies
12h55m

It's not doing Save As that bit people. Think of a .doc file as a bad database format. It gets lots of in-place overwrites, and fragments of old versions stick around.

I can't find a lot that discusses it, but here's one mention: https://news.ycombinator.com/item?id=35252331

dncornholio
1 replies
10h50m

There's nothing to be fixed though.

hnbad
0 replies
9h41m

If people are surprised by this - and clearly a non-trivial number of people are - then even if the behavior works as intended it should be indicated in the UI at critical points.

GitHub is not simply a UI to actual git repositories it hosts. It also carries a lot of data that is not stored in the repository itself. The UI deliberately blends the two types of interactions. There's no such thing as "creating a pull request" in git, for example.

It's not at all unreasonable for a user to assume "forking" merely creates a copy with an upstream origin. Read through the steps again:

    1. You fork a public repository
    2. You commit code to your fork
    3. You delete your fork
Note that from a user's point of view, they only committed code to their "copy" of the repository, i.e. their own repository. They never pushed it upstream or created a pull request that references it. That abstraction is clearly wrong if you look at what actually happens but it's not difficult to see why a user might think this way, especially given that "a fork" often simply means "a copy", i.e. something standalone that then goes on to diverge from its origins (e.g. "Edge is a fork of Chromium" or "MariaDB is a fork of MySQL").

Of course the mistake is that the fork is not a copy. The fork isn't a fork (i.e. a separate copy that shares the original's history), it's a view of the original repository with its own refs. The commits are added to the same repository, only the refs that reference them aren't. This makes sense architecturally but it means most metaphors and analogies people likely bring to the table break down because they assume a fork is a copy, not a collection of refs layered over a shared repository of commits - after all "allowing stranges to add commits to the repository" is what PRs are for. Except of course that's not what PRs do then, PRs actually allow strangers to add references to commits to your branches.

Waterluvian
1 replies
18h48m

I love this xkcd.

We all need to embrace: Nobody has ever been impressed that you already knew something. When people share a discovery with you, it’s not about you. It’s about them and their joy of discovery. They want to share that joy with you.

nox101
0 replies
16h4m

If they're sharing the joy of discovery that's great. Lots of people though are gloating. "Haha, you're stupid, you didn't know X and I did". In other words, they're the ones not being charitable by assuming you don't already know X.

Trying to think of an example it usually goes something like

A: We should do X

B: No. See this document (the sharing part)

A: I wrote that document and I'm telling you we should do X (the "I already knew this" part)

lilyball
7 replies
23h15m

Really the only semi-interesting part of this is "if you make a private repo public, data from other private forks might be discoverable", but even that seems pretty minor, and the best practice for taking private repos public is to copy the data into a new repo anyway.

zelphirkalt
2 replies
22h46m

Is that a best practice in hindsight, or because it was known to some, that this issue exists, or for what other reason do you consider it a best practice? Git history?

scarface_74
0 replies
15h21m

I worked in Professional Services at AWS for a little over three years. There was a fairly easy approval process to put our work out on the public AWS Samples (https://github.com/aws-samples) repository once we removed the private confidential part of the implementation.

I always started a new repository without git history. I can’t imagine trying to audit every single commit.

lilyball
0 replies
19h59m

When making a private repo public, there's a high chance that there was stuff in the private repo that isn't necessarily ok to make public. It's a lot easier to just create a new public repo containing all the data you want to make public than it is to reliably scrub a private repo of any data that shouldn't be there.

More generally, you probably want to construct a new history for the public repo anyway, so you'll want a brand new repo to ensure none of the scrubbed history is accessible.

xmodem
1 replies
22h22m

Even after a private repo is made public, it's common practice for new functionality to be worked on in private until it's ready.

account42
0 replies
7h22m

And according to TFA that case is not affected.

HL33tibCe7
1 replies
19h39m

You’ve completely missed the most dangerous thing mentioned, namely that private forks are not private.

Dylan16807
0 replies
12h33m

You’ve completely missed the most dangerous thing mentioned, namely that private forks are not private.

What do you mean "missed"? They described the situation where data is leaked from a private fork, which is when you make the original repo public.

There's no other time when data leaks. A public repo can't have ongoing private forks.

cxr
6 replies
23h22m

The implication here is that any code committed to a public repository may be accessible forever

That's exactly how you should treat anything made available to the public (and there's no need for the subsequent qualifier that appears in the article—"as long as there is at least one fork of that repository").

ilikehurdles
5 replies
22h55m

Sometimes I wonder if all the security features GitHub slathers on top of `git` lull people into a false sense of security when fundamentally they're working in a fully distributed version control system with no centralized authority. If your key is leaked the solution is to invalidate the key not just synthetically alter your version of history to pretend it never happened.

b800h
3 replies
22h38m

This is more of a problem if you leak private information with a commit by accident. You can't really revoke that.

kemitche
2 replies
20h34m

You can't reach out to any machines that have pulled down that commit and forcibly delete it, either.

hughesjj
1 replies
19h29m

But you can prevent anyone from doing so in the future and cross your fingers that no one has done so yet

b800h
0 replies
13h1m

As per this post, if a lot of people have forked your repo in the past, then you're stuffed.

noname120
0 replies
9h31m

Unless you specifically know and understand the ramifications of this GitHub idiosyncrasy, you have no way to tell that your key was possibly leaked. GitHub never informs you that someone accessed a commit created in your private fork.

josephcsible
4 replies
20h22m

How is this more of a vulnerability than the existence of sites like archive.org is? Isn't it just a fact of the Internet that once you make something public, you can't fully take it back later?

debugnik
1 replies
19h52m

The third case in the article shows private forks being leaked publicly when the upstream goes public.

The other two cases are indeed not worse than third-party archival, but they're still socially concerning. When you ask your own host to delete something you uploaded, you don't expect them to ignore you just because someone could have already archived it maybe. Making it harder to find can still be valuable; not all archives stay available forever, if any.

dwaite
0 replies
16h11m

When you ask your own host to delete something you uploaded, you don't expect them to ignore you just because someone could have already archived it maybe.

I've had a service say that deleting the information fully can take eight months.

hughesjj
0 replies
19h30m

Private forks were never public beyond this gotcha

bogwog
0 replies
20h0m

Because private forks are not meant to be public

haneul
4 replies
23h18m

Does any variant of this apply to DMCA’d repos in the repo network?

For example if the root repo is DMCA’d, or, if repo B forks repo A, then B adds some stuff that causes B to get DMCA’d. Can A still access B?

richbell
3 replies
22h56m

I believe the entire network is suspended.

haneul
2 replies
22h45m

A downstream dmca suspends the upstream? That astonishes me. Anyone down to shut down react?

neongreen
1 replies
17h1m

According to https://docs.github.com/en/site-policy/content-removal-polic..., even an upstream dmca doesn’t suspend downstream by default (unless the copyright owner claims they believe all forks violate copyright) — so I would be surprised if downstream dmca suspended upstream.

NB: according to https://www.gtlaw.com/-/media/files/webinars/ian-ballon-may-..., page 4-470, it’s possible that failing to process a DMCA notice may only lead to losing safe harbor for the material identified in the notice, not for the entire service.

So GitHub might just choose to ignore the notice for React, get sued, and win, all without losing the safe harbor.

For less popular repos, I would not be surprised if you could take down any repo literally by submitting a completely bogus notice.

But honestly I still don’t know how much leeway - legally - service providers have in applying their own technical/legal expertise when evaluating DMCA notices. I’d appreciate any sources (court decisions, textbooks, whitepapers, descriptions of actual industry practices, etc) on the topic.

haneul
0 replies
12h40m

So GitHub might just choose to ignore the notice for React, get sued, and win, all without losing the safe harbor.

It wouldn't be React getting the notice. It would be say, someone forking React, then adding a pull request with some clearly DMCA-violating material.

Then, if downstream B DMCA shutdown doesn't affect upstream A's availability, there's still the question of A normally still having access to B's non-merged commits even in the case of B's deletion. So, A should still be access the DMCA-violating material.

And, if A's access to B's non-merged, DMCA-violating commit is truly revoked without affecting A otherwise... why can't we have a "Strong Delete" button on GitHub? Would seem they'd have to have "Strong Delete" functionality to comply with downstream B hitting DMCA.

Basically, I'm feeling either a violation of principle of least astonishment, or a violation of "strong-DMCA".

Unless this is to support a feature in Git/GitHub that I am too noob to understand. :shrug:

fortran77
3 replies
22h32m

This is why for private and business projects, we don't use GitHub, we use Amazon CodeCommit.

swozey
1 replies
22h9m

Because of literally this issue? I'm not sure if you're doing a generic "I don't like github" or know for a fact that CodeCommit doesn't have issues like this.

This seems like a terrible security vector but I'm not sure migrating thousands of repos out of github vs. training engineers to keep public and private repos completely separated makes sense and you haven't explained why you use CodeCommit.

Unless it is this reason, which like I said, seems a bit heavy handed, but I rarely move private repos to public.

I kind of assumed this was a distributed Git problem, not Github, but I don't know.

fortran77
0 replies
16h35m

I use and like github for open source and publically shared projects.

makach
0 replies
22h19m

The article states that this “vulnerability” might exist in other scm systems as well

beardedwizard
3 replies
13h22m

Truffle is practically famous for clickbait like this. They have a YouTube channel full of it. Their behavior in the security industry steered us far away from them as a vendor.

jonahx
2 replies
13h10m

This is not clickbait.

It's well-explained and fairly presents the facts and GH's position. Based on the reaction here, it's clear many people are not aware of these footguns. If anything, the article is a public service.

beardedwizard
1 replies
12h34m

Based on the comments, many have known since 2018. GitHub has made multiple statements about it.

It's been written about multiple times, and now truffle is reposting old content with a name like IDOR to try to invent a new vuln class that doesn't exist.

The title of the post is misleading, a specific set of repos leak data under specific circumstances - not every repo. The first two sentences of the post immediately downscope the claim made by the title.

Im guessing you didn't bother to check out thier YouTube.

This post is the only thing the OP has ever posted in 8 months, probably because it's truffle themselves. I stand by my statement, it's clickbait.

jonahx
0 replies
10h43m

Based on the comments, many have known since 2018. GitHub has made multiple statements about it.

And many more haven't known. It wouldn't be sitting on the front page with 1300+ upvotes otherwise. This is, effectively, not some ho-hum old news -- even if it was for you. And that's what so many are complaining about. The hypocrisy of violating POLA so blatantly and then shrugging it off, pointing to some explanation buried in the docs that they know damn well most people won't read, and saying "Hey the info is right there, on you if you didn't RTFM".

ajross
3 replies
22h25m

Most of this report is just noise. GitHub repos are public. Public stuff can be shared. Public stuff shared previously and then deleted is "still available", but it was shared previously and not really subject to security analysis.

The one thing they seem to be able to show is that commits in private branches show up in the parent repository if you know the SHAs. And that seems like a real vulnerability. But AFAICT it also requires that you know the commit IDs, which is not something you can get via brute forcing the API. You'd have to combine this with a secondary hole (like the ability to generate a git log, or exploiting a tool that lists its commit via ID in its own metadata, etc...).

Not nothing, but not "anyone can access private data on GitHub" as advertised.

beezlewax
1 replies
21h15m

There's a whole section here about how to brute force the hashs. You don't even need the full hash... just a shortened version using the first few chars.

ajross
0 replies
4h1m

I'm dubious. Searching for globally unique commit IDs is still a least a million+ request operation. That's easy enough in a cryptographic sense but the attack in question requires banging a web UI, which is 100% for sure going to hit some abuse detector. I really don't think you can do this in practice, and the article certainly doesn't demonstrate it.

LoganDark
0 replies
21h21m

it also requires that you know the commit IDs, which is not something you can get via brute forcing the API

Well, GitHub accepts abbreviations down to as short as four hex digits... as long as there's no collision with another commit, that's certainly feasible. Even if there is collision, once you have the first four characters you can just do a breadth-first search

varispeed
2 replies
9h54m

I learned about it years ago when I accidentally pushed secrets to the repo. When after rebasing and force pushing to the branch I was still able to access that commit, we decided to stop using GitHub.

account42
1 replies
7h6m

Hopefully you have since learned to read the documentation of the tools you use, or at least enough of it to understand the basic data model you are working with. Rebasing won't even (immediately) remove the commits from your local repo. And force pushing isn't some magic operation either.

Further, even if you had managed to delete the secrets from the repo you have to assume that others already copied them and rotat your keys anwyay.

varispeed
0 replies
5h13m

Yes, the credentials were invalidated promptly, before trying to remove them from GitHub. That said, we were using different version control system and GitHub was new to us. This was many years ago.

Rizz
1 replies
8h5m

As mentioned it works for valid short hashes, if there are multiple commits with those first 5 characters then you need to make it more specific by bruteforcing, appending a 2, 4, 7, or 8 will lead to a valid commit.

sgc
2 replies
18h6m

This walks like a dark pattern and quacks like a dark pattern. People's entire livelihoods are at stake and they don't care. Most likely because plausible deniability and obscure TOS rights of how and when the code is used is more valuable to them than the reputation hit. It is hard to imagine this is very hard to fix.

account42
1 replies
7h14m

People's entire livelihoods are at stake

No they aren't.

sgc
0 replies
5h26m

Sure they are. If somebody has a proprietary product that they happened to organize as a fork of an open source base at some point, it is exposed. The git organization aside, that is a very common business model.

qual
0 replies
23h10m

Come on, this is not surprising.

Very cool that it is not surprising to you.

But to others (some are even in this thread!) it is both new and surprising. They unfortunately missed your 4 year old comment, but at least they get to learn it now.

Dylan16807
0 replies
12h20m

Your argument from before is just that the user is not in full control.

Well, duh. That's not a reason to avoid every "private" feature in every product on the planet.

A failure in the system is still surprising. I could equally say "all software has bugs, so it's not surprising if your self-hosted solution leaks data". But that would be too dismissive, as you are being.

mmsc
2 replies
22h58m

This is such an enormous attack vector for all organizations that use GitHub that we’re introducing a new term: Cross Fork Object Reference (CFOR)

Have we stopped naming vulnerabilities cute and fuzzy names and started inventing class names instead? Does this have a logo? Has this issue been identified anywhere else?

booi
1 replies
21h45m

Introducing a new vulnerability... Git Forked™!

chatgpt: Create a logo image of a fork impaling a small gnome named "code"

riiii
0 replies
20h37m

Much better name.

It's very formally called Cross Fork Object Reference (CFOR). But commonly known as Git Forked! (Including the exclamation mark).

madewulf
2 replies
20h55m

In fact, there is a process to request complete removal of data, but it involves sending an email that will be reviewed by github staff: https://docs.github.com/en/site-policy/content-removal-polic...

On the other hand, once an API key or password has been published somewhere, you should rotate it anyway.

riedel
1 replies
19h53m

I was wondering, how they can otherwise comply with legislation. Makes sense there is a way to do this e.g. in case of valid GDPR, DMCA, etc. cases.

majorchord
0 replies
18h54m

Github's own DMCA reporting repo has warez in it from deleted PRs you can still access with the original link. Been that way for years

LeifCarrotson
2 replies
22h34m

IMO, the real vulnerability here is the way the Github Events archive exposes the SHA1 hashes of the vulnerable repositories. It would be easy to trawl the entire network to access these deleted/private repositories, but only because they have a list of them.

Similar (but less concerning) is the ability to use short SHA1 hashes. You'd have to either be targeting a particular repository (for example, one for which a malicious actor can expect users to follow the tutorial and commit API keys or other private data) or be targeting a particular individual with a public repository who you suspect might have linked private repositories. It's not free to guess something like "07f01e", but not hard either.

If these links still worked exactly the same, but (1) you had to guess 07f01e8337c1073d2c45bb12d688170fcd44c637 and (2) there was no events API with which to look up that value, this would be much, much less impactful.

SnowflakeOnIce
0 replies
20h16m

'git clone --mirror' seems to pull down lots of additional content also.

yobid20
1 replies
16h32m

all your private photos on gdrive have publically accessable urls too. most ppl dont know all their private photos are exposed to the world.

scarface_74
0 replies
15h17m

As far as I know, Google only creates a link once you explicitly ask it to share

tamimio
1 replies
22h9m

I don’t use GitHub for anything serious, rather my own Gitea. However:

Any commits made to your private fork after you make the “upstream” repository public are not viewable.

Does that mean a private repo that has never been or will be public isn’t accessible? That scenario wasn’t mentioned.

fedorareis
0 replies
21h18m

My understanding is that you are correct. If the repo and all of its forks stay private then the only people that would be able to view them are people who have permissions to access those repos.

renewiltord
1 replies
22h38m

To fork private, I always just make a new repo and push to it. Looks like that behaves correctly here.

kemitche
0 replies
20h33m

Agreed. If anything, github should remove the option to change a repo from private to public or vice versa. Force creation of a new repo with the correct settings.

poikroequ
1 replies
22h21m

Microsoft: It's the EUs fault!

Also Microsoft: It's a feature!

theragra
0 replies
21h50m

It was known before Microsoft

kassah
1 replies
22h19m

In response to the end of the article "it’s important to note that some of these issues exist on other version control system products." I actually have experience helping someone with an issue on BitBucket with PII data that you can't rotate.

Once we eliminated the references in the tree and all forks (they were all private thankfully), we reached out to BitBucket support, and they were able to garbage collect those commits, and purge them to the point where even knowing the git hashes they were not locatable directly.

gbalduzzi
0 replies
10h16m

Github also supports that if you reach out support directly

8organicbits
0 replies
16h23m

If I'm a CTO how do I protect my company from this foot gun? Do I need to regularly train everyone with a GitHub account about the details, is there a setting I can toggle, or...?

dathinab
1 replies
21h29m

commits done to private repose being public (point 2&3) is always a non minor security vulnerability IMHO

it doesn't matter if it's behaving as intended or how there are forks

also point 1 implies that github likely doesn't properly GCes there git which could have all kinds of problematic implications beyond the point 1 wrt. purging accidental leaked secrets or PI....

all in all it just shows github might not take privacy security serious ... which is kinda hilarious given that private repo using customers tend to be the paying customers

keybored
0 replies
21h22m

You’re right that they don’t let commits get GC. They jump through hoops in order to keep commits that are not transitively referenced from being garbage collected. Just assume that every commit is kept around for “auditing”.

One GitHub employee even contributed a configuration to Git which allows you to do the same thing: run a program or feed a file which tells the GC what nodes to not traverse.

ahpook
1 replies
19h35m

Hubber here (same username on github.com). We in GitHub's OSPO have been working on an open source GitHub App to address the use case where organizations want to keep a private mirror of an upstream public fork so they can review code and remove IP/secrets/keys that get committed and squash history before any of those changes are made public. Getting a beta release this week, in fact - check it out, I'm curious what yall think about the approach

https://github.com/github-community-projects/private-mirrors

dttocs
0 replies
3h50m

Looks like a promising tool and workflow to mitigate the risks we are discussing here. If you haven’t already done so, it might help the discussion here if you could highlight how this app deals with the issues outlined. Is the intent of the mirror repo creation that it’s more-or-less equivalent to “git clone —mirror”? I took a quick look at the code, and didn’t see a direct correspondence with “git clone —mirror” when creating the mirror repository.

agentdrek
1 replies
22h57m

Clearly a POLA violation (principle of least astonishment)

account42
0 replies
7h21m

So it using uncommon acronyms when you only referencing the thing once.

Osiris
1 replies
16h46m

So does that mean that forked repos don't do garbage collection of unreferenced commits?

If I force push and orphan a commit, I expect that will get garbage collected and be gone forever.

Or if I commit a file I shouldn't have and rewrite my repo history and push up a whole new history, is the old history still hanging out forever?

If true, then it seems that there is no way to delete any commits at all from any repo that has any forks?

dwaite
0 replies
16h22m

If true, then it seems that there is no way to delete any commits at all from any repo that has any forks?

I do not believe the presence of forks matters. Or rather, your version is the initial fork.

My impression is that garbage collection is an expensive and disruptive option (to all forks) and so there's no button or API for it. Hence the recommendations to contact support if you accidentally commit an API KEY or the like (but really, you have already rotated that key, right?)

zelon88
0 replies
5h25m

Doesn't this kind of make sense? We are not dealing with personal property. We are dealing with term licensed software.

Github is a software distributing network. Like the app store, or Steam. They grant you access to licensed content, which you self license, and then they facilitate access for you. Based on the honor system. But some things can just be assumed to be true for the sake of simplicity and liability.

For example, If I make a repo public and then take it private the hashes that were obtained while it was open are still open. If I make a repo that's closed and open it, the whole thing is open.

If you fork a public repo and make private commits on it to a software distributor like Github, that is probably just going to end in a violation of the license. In this scenario, Github is saving you from yourself.

yread
0 replies
22h20m

On the positive side this takes care of all those companies forking open source software and not contributing back

yard2010
0 replies
10h40m

1 more reason to use GitLab <3

x-yl
0 replies
11h9m

This behaviour is also important for ergonomic submodules. The .gitmodules file lists the upstream repo as the origin. So, if you're modifying an upstream project in a submodule and push changes to a fork, it's important that the SHA that git tracks is still reachable through the upstream link.

Ultimately I don't think it's feasible to break this behaviour and the most we can hope for is a big red warning when something counterintuitive happens.

wtcactus
0 replies
5h38m

Should GitHub be liable for any damages caused by this issue, like some think Crowdsec should be for what happened last week?

Morally seems even worse, Crowdsec did it by accident, GitHub knows about it for years now.

welder
0 replies
5h45m

The only valid one is the last (3rd) one:

Accessing commits on a private fork when it's upstream is made public

The other 2 are just common sense... push something to a public repo and it's public forever. Everyone knows once somethings on the internet it's already too late to make it secret again.

throwawaydummy
0 replies
15h44m

Tangential to the article but interested in seeing how Microsoft will fare compared to Tesla

solatic
0 replies
11h48m

There's quite a long list of "open core" companies whose models are, start from a private repository (i.e. company is in stealth), make a private fork that will include for-profit code with enterprise features, make the original repository public so that the core will be open-source.

That GitHub is telling these companies, and bear in mind that these companies are paying customers of GitHub, yeah we don't care that your private proprietary code can be hacked off GitHub by anybody, is incredibly disturbing. Is there really not enough pressure from paying customers to fix this? Is Microsoft just too big to care?

scosman
0 replies
18h29m

I maintain a pretty popular template for SaaS websites. Every few weeks someone would send a PR with all their private fork data, then quickly try to delete it.

Making it a "template" repo mostly fixed the issue. That creates a copy instead of a fork. However it still happens from time to time.

rocqua
0 replies
11h53m

They have the yellow banner to detect when you likely access a hash like this. Why do they allow those commit hashes to be accessed through the short commit hashes?

primer42
0 replies
15h34m

So the moment something is published on the Internet publicly, there's a chance it will be saved and you will not be able to get it deleted.

That, unfortunately, sounds like the result of publishing something on the Internet. Not GitHubs fault.

otagekki
0 replies
19h8m

A serious security issue indeed, if someone knows the hash.

How I manage this is that every time I want to open-source a previously private feature, I take the changeset diff and apply that to the files in the public repository. Same features, but plausibly different hash.

nostrademons
0 replies
20h58m

Cool, another way to access youtube-dl next time it gets deleted from GitHub.

mro_name
0 replies
3h40m

My mother can't. And she doesn't mind.

mro_name
0 replies
3h42m

I always acted as if there were no such thing as private data on github. Maybe even the internet as a whole.

miguelaeh
0 replies
23h19m

Wow. This is wild!

midtake
0 replies
20h40m

Just rebase/squash everything.

makach
0 replies
22h20m

A “delete” means it should be gone forever from the service it was removed from.

“Private” means it should only be available to specific involved parties only.

If you implement any other behavior to these concepts you are implementing anti patterns.

We need to be precise and consistent in the wording of the functions we are providing in order to ensure we easily can understand what is going on, without having to interpret documentation to be able to fully understand what is going on.

letmeinhere
0 replies
17h57m

I wonder if copyleft projects can use this to find license violations and force the altered code into the open.

lenerdenator
0 replies
17h29m

I wonder how all of the companies using "private" repos on GitHub feel about this.

j2kun
0 replies
19h31m

The title makes it seem more severe than it is. This only applies to GH forks of public repositories (or repositories that become public). Forks mirror the upstream repo's visibility.

j-pb
0 replies
21h1m

Commit hashes are essentially capabilities, you should be able to access any data that you have a capability for. But allowing access via a 16bit prefix is just idiotic, and equivalent to accepting just the first two bytes of a 256bit cryptographic signature...

irrational
0 replies
3h50m

So... this is only an issue with forking, right? And, forking is not the same thing as branching... right? I'm just trying to make sure I understand this since I do branching all the time, but have never forked anything.

globular-toast
0 replies
10h58m

I actually think this is a good thing and should simply be made more clear. The reason is the following from the article:

I submitted a P1 vulnerability to a major tech company showing they accidentally committed a private key ... They immediately deleted the repository,

That is a ridiculous response to a compromised key. The repository should not have been "deleted", the key should have been revoked.

Imagine if you lost a bag with 100 keys to your house. Upon realising you desperately try to search for the bag only to find it's been opened and the keys spread around. You comb through the grass and forests nearby collecting keys and hoping you find them all.

Or you just change the locks and forget about it.

If you upload something, anything, to a computer system you do not own you need to consider it no longer secret. It's as simple as that. Don't like it? Don't do it.

I detest things like delete buttons in messaging apps and, even worse, email recall in Outhouse-style email apps. They just give people a false sense of security. I've been accidentally sent someone's password several times on Teams. Yeah you deleted the message, but my memory is very good and, trust me, I still know your password.

If there's a security problem here it's in people believing you can delete stuff from someone else's system, or that that systems make it look like you can. The solution is the same though: education. Don't blame GitHub. Don't force them to "fix" this. That will only make it worse because there are still a million other places people will upload stuff and also won't actually delete stuff.

gigatexal
0 replies
12h4m

So if I read the article correctly if I never fork or otherwise contribute from my private repo I’m good?

galkk
0 replies
20h20m

I won't be surprised if "right to be forgotten"/GDPR abusers will spam github and force them to act on it, eventually.

----

This is clearly documented and can be explained even to non-technical managers.

From my POV calling that vulnerability is trying to build a hype.

I think that having quote from here on visibility changing settings page would be even more clear: https://docs.github.com/en/pull-requests/collaborating-with-...

fmeyer
0 replies
18h50m

I reported a similar and even more damaging I my opinion (https://hackerone.com/reports/2240374) and they also dismissed as by design.

Turns out I found out you could even invite external collaborators into your fork and totally bypass enforced SSO.

Even if you block forking into your main repo, the existing forks remains active and still can pull from upstream.

It feels like if you need proper security, you have to go with enterprise

est31
0 replies
15h6m

Earlier thread: https://news.ycombinator.com/item?id=39481933

I'm not so sure about the "forever" part as git gc is a thing, and at least in 2013 they ran it regularly: https://stackoverflow.com/a/56020315

No idea about nowadays though. There is this blog post:

https://github.blog/engineering/scaling-gits-garbage-collect...

We have used this idea at GitHub with great success, and now treat garbage collection as a hands-off process from start to finish.
ericfrederich
0 replies
4h17m

Wow, that's crazy. I tried a 6 digit hash and got a 404, then I tried another 6 digit hash and got "This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository."

Insane

ericfrederich
0 replies
5h8m

1) Fork the repo. 2) Hard-code an API key into an example file. 3) <Do Work> 4) Delete the fork.

... yeah if <Do Work> is push your keys to GitHub.

einpoklum
0 replies
23h22m

Data that you place with an entity that is a large organization with many commercial and government ties - must be assumed to be accessible to some of those parties.

And if that entity has a complex system of storage and retrieval of data by and for many users, that changes frequently, without public scrutiny - it should be assumed that data breaches are likely to occur.

So I don't see it as very problematic that GitHub's private repositories, or deleted repositories, are only kind-sorta-sometimes private and deleted.

And it's silly that the article refers to one creating an "internal version" of a repository - on GitHub....

Still, interesting to know about the network-of-repositories concept.

eezing
0 replies
21h12m

I’m glad I don’t use forks

dncornholio
0 replies
10h51m

People should realize that once you upload something, it will be out there, forever. I assume this happens to everything.

Trusting some company will actually delete your stuff is kind of naive in my opinion.

The example of people forking and putting an API key in the repo, I would never let my people do this. Once you push, it will be "out there".

devinsewell
0 replies
21h10m

and people have been yelling at me for refusing to ever use github since 2013 lolo

daitangio
0 replies
12h7m

Question: but for deleted you just mean commit to delete it? Because if you remove the commit from the repo, it should disappear.

crvdgc
0 replies
20h17m

I think the first two points are a result of private data (commit/fork/issue) being able to refer to public data without making the reference public.

Say a private commit depends on a public commit C. Suppose in the public repo, the branch containing C gets deleted and C is no longer reachable from the root. From the public repo's point-of-view, C can be garbage-collected, but GitHub must keep it alive, otherwise the deletion will break the private commit.

It would be "a spooky action at a distance" from the private repo's POV. Since the data was at a time public, the private repo could have just backed up everything. In fact, if that's the case, everyone should always backup everything. GitHub retaining the commit achieves the same effect.

The public repo's owner can't prevent this breakage even if they want to, because there's no way to know the existence of this dependency.

The security issue discussed in the post is a different scenario, where the public repo's owner wants to break the dependency (making the commit no longer accessible). That would put too much of a risk for anyone to depend on any public code.

My mental model is that all commits ever submitted to GitHub will live forever and if it's public at one time, then it will always be publicly accessible via its commit hash.

cottsak
0 replies
15h42m

Key takeaways for me:

1) Never store secrets in any repo ever! As soon as you discover that its happened, rotate the key/credential/secret asap!!

2) Enterprises that rely on forking so that devs can colab are fucked! Protecting IP by way of private repos is now essentially broken on GH!

3) what the actual fuck github!!??

chadsix
0 replies
4h28m

I'm surprised that nobody suggested self hosting a GitLab or Gitea instance. [1]

[1] https://ipv6.rs/cloudseeder

bogota
0 replies
13h42m

Holy shit. What a joke of a company.

bladegash
0 replies
23h5m

Unrelated, but another interesting one is any non-admin contributors being able to add (and I believe update) secrets in a private repo for use in GH actions. It can’t be done via the UI, but can be done via the API or VSCode extension.

When I looked into it a while back, apparently it is intended behavior, which just seems odd.

asmor
0 replies
4h50m

I found some obscure instances where user expectation doesn't match reality on GitHub before, and nobody there cares.

If anyone's wondering: Organizations that require SAML are included in your organizations even when you don't have a SAML session when signing in elsewhere via OAuth. Unlike generalized per-organization app authorizations, where GitHub can actually hide organization membership. Only way to find out if a user has a SAML session is for the consuming app to request the membership with your token, and interpret 403 as "no SAML session". As far as I know only Tailscale implemented this. This really sucks for apps like SonarCloud where someone can now view work code from their so cleanly separated personal and professional use GitHub account.

amluto
0 replies
14h24m

ISTM there’s a straightforward mitigation or two available to GitHub:

1. If a URL would be in the “[t]his commit does not belong to any branch of this repository, and may belong to a fork outside of the repository” and that URL uses a shortened commit hash, return 404 instead. Assuming no information leakage via timing, this would make semi-brute-force probing via short hashes much harder.

GitHub is clearly already doing the hard work for this.

2. A commit that was never public should not become public unless it is referenced in a public repository.

This would require storing more state.

account42
0 replies
9h20m

Great website design that loads fine without scripts but then runs something that requires features found only in newer browsers and then deletes the entire content when that fails. Why?

Szpadel
0 replies
22h19m

even better you can actually commit to other forks if they creates pull request to you.

(there is checkbox allowing that when you are opening PR that I bet almost noone noticed)

I reported that years ago and all they changed it that they extended documentation about this "feature"

my main issue was that you cannot easily revoke this access because target repo can always reopen PR and regain write access.

but they basically "stated works as intended"

Osiris
0 replies
16h39m

The few times I made private copy public I made a brand new git repo, copied the working copy over, and published that as public. I'd never include past private git history when making something public.

NavinF
0 replies
19h31m

Commit hashes can be brute forced through GitHub’s UI, particularly because the git protocol permits the use of short SHA-1 values when referencing a commit. A short SHA-1 value is the minimum number of characters required to avoid a collision with another commit hash, with an absolute minimum of 4. The keyspace of all 4 character SHA-1 values is 65,536 (16^4). Brute forcing all possible values can be achieved relatively easily.

But what’s more interesting; GitHub exposes a public events API endpoint. You can also query for commit hashes in the events archive which is managed by a 3rd party, and saves all GitHub events for the past decade outside of GitHub, even after the repos get deleted.

Oof

Jean-Papoulos
0 replies
11h15m

Thank you for relaying this. I'll be moving off Github this weekend.