I reported this on their HackerOne many years ago (2018 it seems) and they said it was working as intended. Conclusion: don't use private forks. Copy the repository instead.
Here is their full response from back then:
Thanks for the submission! We have reviewed your report and validated your findings. After internally assessing the finding we have determined it is a known low risk issue. We may make this functionality more strict in the future, but don't have anything to announce now. As a result, this is not eligible for reward under the Bug Bounty program.
GitHub stores the parent repository along with forks in a "repository network". It is a known behavior that objects from one network member are readable via other network members. Blobs and commits are stored together, while refs are stored separately for each fork. This shared storage model is what allows for pull requests between members of the same network. When a repository's visibility changes (Eg. public->private) we remove it from the network to prevent private commits/blobs from being readable via another network member.
Honest question. Submitting these types of bugs only to get a: "we have determined it is known low risk issue..." seems like they really don't want to pay for someone else's time and dedication in making their product safer. If they knew about this, was this disclosed somewhere? If not I don't see them playing a fair game. What's the motivation to do this if in the end they can have the final decision to award you or not? To me it looks like similar to what happens with Google Play/Apple store to decide whether or not an app can be uploaded/distributed through them.
Edit: I popped this up because to me is absolutely miserable from a big company to just say: "Thanks, but we were aware of this".
Not defending GH here (their position is indefensible imo) but, as the article notes, they document these behaviors clearly and publicly:
https://docs.github.com/en/pull-requests/collaborating-with-...
I don't think they're being underhanded exactly... they're just making a terrible decision. Quoting from the article:
Based on some (admittedly not very thorough) search, this documentation was posted in 2021, three years after my report.
But that would still means they didn't intend to fix it, hence not giving bounty is fair.
It's a bug bounty, not a "only if we have time to fix it" bounty.
He found a security problem, they decided not to act on it, but it was still an acknowledged security problem
The point of a bug bounty is for companies to find new security problems.
If the (class of) problem is already known, it’s not worth rewarding.
I can see this argument making a bit of sense, but if they documented this 3 years after the issue was reported, they don't have a way to demonstrate that they truly already knew.
At the end it boils down to: is Github being honest and fair in answering the bug bounty reports?
If you think it is, cool.
If you don't, maybe it's not worth playing ball with Github's bug bounty process
It doesn't matter if they knew. If they don't deem it a security vulnerability --- and they have put their money where their mouth is, by documenting it as part of the platform behavior --- it's not eligible for a payout. It can be a bug, but if it's not the kind of bug the bounty program is designed to address, it's not getting paid out. The incentives you create by paying for every random non-vulnerability are really bad.
The subtext of this thread is that companies should reward any research that turns up surprising or user-hostile behavior in products. It's good to want things. But that is not the point of a security bug bounty.
I would argue that even if the behaviour was as intended, at least the fact that it was not documented was a bug (and a pretty serious one at that).
Again: you don't generally get bounties for finding "bugs"; you get them exclusively for finding qualified vulnerabilities.
That's true, but what's stopping a company from documenting a security issue as a known (mis)behaviour/bug? [*]
Companies can join/setup a bug bounty program, and just use it as a fig leaf for pretending to care about their own product/service's security.
Of course bug bounties can and are abused daily by people who report trivial non-issues in the hope of compensation
But in the same way, companies can also be bad actors in the way that they engage with bounties. I would usually expect big names (like Google, Apple, Github, etc.) to be trustworthy...
[*] Of course what stops companies is precisely them not being seen as trustworthy actors in the bug bounty system anymore... And for now, that's a decision that individuals have to make themselves
No large company cares even a tiny bit about the money they're spending on bug bounties. They would literally lose money trying to cheat, because it would cost them more in labor to argue with people than to pay out. In reality, the bounty teams at Google and Apple are incentivized to maximize payouts, not minimize them.
If you don't trust the company running a bounty, don't participate. There are more lucrative ways to put vulnerability research skill to use.
So much this. It's pretty clear that most people commenting on this thread have never been involved in a bug bounty program on the company's side.
Bug bounty programs get a lot of reports, most of which are frankly useless and many of which are cases of intended behavior subjectively perceived as problematic. Sifting through that mess is a lot of work, and if you regularly pay out on unhelpful reports you end up with many more unhelpful reports.
This particular case definitely feels like one where the intended behavior is horribly broken, but there are absolutely many cases where "this is intended" is the only valid answer to a report.
If a renown company won't pay a bug bounty, a foreign government often will.
Good luck selling this to a foreign (or domestic) government. It doesn’t seem valuable to me, but who knows, maybe someone finds it worth payout.
Why would a foreign government pay for a commonly known security limitation of a product?
It's only a bug if it's not intended
Do some companies intend for their platform to feature remote code execution?
Remote code execution is literally a feature of GitHub…
Sandboxed code execution is a bit different than RCE.
Some might very well do. E.g. a company with a service for training hackers and security researchers.
In this case the question is moot, as this doesn't involve remote code execution.
Make a general point, get a general answer.
If the criteria for bug is "intended", and that's solely judged by the company, then broken auth et al. suddenly become part of their product design.
If it quacks like a bug, it's a bug.
I think a lot of developers and companies interpret "that's the way the code or process works" as intentional behavior, which is not always the case.
The property (“bug”) in question is an inherent and intentional property of meekly-tree type storage systems such as git.
Calling this a bug is like reporting that telnet sends information unencrypted.
The actual bug is in the way that their UX paradigm sets user expectations.
Don't blame Git for Github decisions.
Github chooses to store all "Github forks" in the same repository, and allow accessing things in that repository even when they are not reachable by the refs in the namespace of one "fork". That is purely a Github decision.
s/meekly/Merkle/g
From the article:
"We surveyed a few (literally 3) commonly-forked public repositories from a large AI company and easily found 40 valid API keys from deleted forks."
This is how your customers get their entire cloud taken over, because you made a stupid, stupid decision and instead of fixing it when warned (repeatedly!) you instead decide to just blame the customer for not reading page 537 paragraph 3 subsection B about the counter-intuitive security footgun you've left in your product.
This is negligence, pure and simple.
If you published a key, you must assume someone copied it and that deleting references to it is not sufficient. You must rotate that key now, and should check whether it was used improperly. This is pretty basic incident response.
The thing about exposing commits that were only ever in a private repo is pretty indefensible, but not garbage collecting public commits on delete shouldn't matter.
Why would anyone think that a private fork is "published"!?
This is the footgun here: The UI is telling you that nobody can see the secrets you committed to your private copy, but actually it is widely accessible.
A similar example of UI-vs-reality mismatch that I've noticed recently is the Azure Store Account "public" visibility. By default, it uses your authenticated account for RBAC access checks, so if you click around it'll say something like "you don't have browse access". This looks secure, but attempting access anonymously would have succeeded!
I had a customer recently where this happened -- they clicked through every Storage Account to "check" them, convinced themselves they were secure, meanwhile they had database backups with PII accessible to world+dog!
Putting keys in repos should not be done, full stop. Even if GitHub forks weren’t public, their _private_ repos could one day be compromised. Instead, store keys in a shared vault, .gitignore the .env and have a .env.example with empty keys.
Any time I hear “shouldn’t be done” I translate that to “will happen regularly”.
I do see this regularly in my work. All but one dev team I’ve worked with over the last few years has done this.
Don't blame the end user for doing something you don't want them to do if it is more convenient to do and works without immediate consequences. Redesign it or rethink your assumptions.
The bit you quoted is referring to public forks that were deleted. That sounds like a non-issue to me, and I'm not at all surprised that
1. Public "forks" are just namespaced branches that share an underlying repo
2. They don't run the garbage collector all the time
I'd be surprised if those weren't true.
Like I said, the behavior with private forks sounds indefensible.
The OP is mixing together multiple things. Being able to access deleted public data isn't that surprising and definitely isn't a security issue as far as leaking keys is concerned (it was already public. Assume it has been cloned). Being able to access private forks is a footgun/issue. They should be garbage collecting as part of public repo creation so that unreferenced commits from private forks aren't included.
As far as I can tell, they never run the garbage collector. Code I pushed to a fork that was deleted several years ago can still be accessed through the original parent repo.
Anyone who put sensitive content in a git repo should consider published anyway. Git is a decentralized tool, as a company you cannot control the amount of git remotes that may host your code. Considering your code is only hosted as a private repo in a specific remote git server is at best naive. This is without even considering the amount of copies that are stored in dev computers.
Besides, anyone who put stuff on a third party publicly accessible infrastructure should consider it published anyway as breaches happen all the time.
If you happen to have api keys stored in a git repo, the only viable response is rotating those keys.
Shouldn't that be on the config page for the repo below the "private" button with a note saying private is not actually private if it's a fork? And ditto for delete?
As the article pointed out, GitHub already publicly documented this vulnerability.
My employer doesn't pay out for known security issues, especially if we have mitigating controls.
A lot of people spam us with vulnerability reports from security tools we already use. At least half of them turn out to be false positives we are already aware of. In my opinion, running a bug bounty program at all is a net negative for us. We aren't large enough to get the attention of anyone competent.
I'm honestly not yet convinced that is enough here - I've fallen victim to this without realizing it - the behaviour here is so far removed from how I suspect most user's mental model of github.com works. For me none of the exposed data is sensitive, but the point remains I was totally unawares it would be retrievable like this.
If the behaviour flies so against the grain, just publishing it in a help doc is not enough I'd argue. The linked article makes the exact same argument:
The problem with this line of argument is that the fundamental workings of git are also surprising to people, such that they routinely attempt to address mistaken hazmat commits by simple reverts. If at bottom this whole story is just that git is treacherous, well, yeah, but not news.
There's a deeper problem here, which is that making the UX on hosting sites less surprising doesn't fix the underlying problem. There is a best-practices response to commiting hazmat to a repository: revoke the hazmat, so that its disclosure no long matters. You have to do this anyways. If you can't, you should be in contact with Github directly to remove it.
Is "git" relevant here? Forking isn't a git concept, and none of this behaviour has much to do with git; it's all GitHub.
Also, you can revoke an API key, but you can't revoke a company-proprietary algorithm that you implemented into a fork of a public project.
Like I said: if you can't revoke the thing you committed, you need to get in touch with Github and have them remove it. That's a thing they do.
Sure, but the whole point of the article is that people don't know their "private" forks aren't private. You can't get in touch with GitHub if you've never had any indication that anything's wrong.
The solution for that is better UX.
aside: I think it's questionable to say that forking isn't a git concept. it's just a branch on a different upstream. Those two upstreams could simply be two different folders on your machine, or shared server.
I supposed the branding and UI for it could be a counter argument, but then again Github allows regular branch creation / committing / merging in their UI. Their main value add (not downplaying it—it's huge) on top of git (besides ancillary things like CI / linters) is the ability to comment on a branch's diff, i.e. a PR Review.
There's an entire custom UX flow for forking on GH that is not part of git at all. I think its very fair here to discuss "fork" in the specific sense Github uses it, as its what has lead to some of the issues discussed. There are absolutely means of providing fork functionality that don't have some of the problems we are discussing, but that's not how GH chose to build it.
Yeah, and we can blame a lot of that on the Git developers, but they never use words like ‘public’ and ‘private’ to indicate things they’re not.
Regardless, the vulnerability in Github forks falls squarely on Github, and is not mitigated by Git being hard to understand in the first place.
Two thinks can be true (and are)
1. GitHub has a nasty privacy/security hole, where commonsense expectations about the meanings of common words are violated by the system.
2. Github has publicly announced that they don't care about this part of user data security (private code), so won't pay people to know tell them what they alreay know and announced.
Github won't pay you to tell them they are wrong when everyone alreay knows.
As the author pointed out, the documentation was written three years after he reported it.
Beyond that is is also a batshit crazy implementation. Just I imagine AWS would still allow AWS credentials to give access to a deleted account
The expectations for AWS and public repository hosting are not the same. If you leaked something to a public GitHub repo you should assume that it has been cloned the second you pushed it.
This is about access to private repos, not public ones:
"Anyone can access deleted and private repository data on GitHub"
For both sides it turns into a net negative. Better to keep your bugs and use them when needed or sell them to others to use if possible.
Lets get back to what we had before when multiple people can find the same bug and exploit if needed. Now we have the one person who finds the bug it gets patched and they don't get paid.
No large company running a bug bounty cares one iota about stiffing you on a bounty payment. The teams running this programs are internally incentivized to maximize payouts; the payouts are evidence that the system is working. If you're denied a payment --- for a large company, at least --- there's something else going on.
The thing to keep in mind is that large-scale bug bounty programs make their own incentive weather. People game the hell out of them. If you ack and fix sev:info bugs, people submit lots more sev:info bugs, and now your security program has been reoriented around the dumbest bugs --- the opposite of what you want a bounty program to do.
In my (admittedly limited) experience, whilst payouts for bugs might be seen as a positive internally, payments for bad architecture/configuration choices are less so (perhaps as they're difficult to fix, so it's politically not expedient to raise them internally).
To provide one example I reported to a large cloud provider that their managed Kubernetes system exposed the Insecure port to the container network, meaning that anyone with access to one container automatically got cluster-admin rights. That pretty clearly seems like not a good security choice, but probably hard to fix if they were relying on that behaviour (which I'm guessing they were).
Their response was to say it was a "best practice" behaviour (no bounty applicable) and that they'd look to fix and asked me not to publicly mention it. Then they deprecated the entire product 6 months later :D
That's one example but I've seen similar behaviour multiple times for things that are more architecture choices than direct bugs, which makes me think reporting such things isn't always welcome by the program owners.
Repeating myself: this almost certainly has nothing at all to do with the money they'd have to give you (I assure you, if there's even a whiff of legitimacy to your report, the people managing the bounty would probably strongly prefer to pay you just to get you off their backs) and everything to do with the warped incentives of paying out stuff like this. People forget that the whole point of a bug bounty is that the rewarded bugs get fixed; the bounty is directing engineering effort. If it directs them to expensive work they already made a strategic decision not to do, the bounty is working against them.
You would prefer this company to have made a different strategic choice about what to spend engineering time on, and that's fine. But engineering cycles are finite, so whatever time they'd spend configuring K8s differently is time they wouldn't be spending on some other security goal, which, for all we know, was more important. Software is fathomlessly awful, after all.
Security disclosures are like giving someone an unsolicited gift. The receiver is obligated to return the favor.
But if you buy someone non-refundable tickets to a concert they already have tickets for, you aren't owed compensation.
Exactly.
Not at all. This is a very toxic expectation.
Security disclosures are like telling someone they have a spot on their face. It's not always welcome, and there's no obligation on anyone to do so, nor anyone to return the favor.
In this case, the spot turned out to be a freckle, which everyone involved already knew was a freckle (since it was documented), and if anyone owes anyone anything, it's the researcher that owes github for wasting their time.
Disagree. This is obviously a deliberate design choice with obvious implications. Expecting a bounty for reporting this is unreasonable. These kind of beg bounties are exactly what gives security "researchers" a bad name.
The security implications are also minor. The only problem really is with making a fork of a private repo public - that should only make what exists in that fork public and not any other objects. Something that was already public staying public even when you delete it from your repo is not a security issue at all. Keys you have ever been pushed to a public repo should be revoked no matter what, with or without this GitGub feature.
I reported a variant of this issue that (to me) was unexpected:
* You add someone to your private repo.
* After some time, you revoke their access.
As long as they keep a fork (which you can't control) they can use this same method to access new commits on the repo and commits from other private forks.
Back in 2018, this was a resolved as won't fix, but it also wasn't documented.
I wasn't really expecting a bounty, more so hoping they'd fix the issue. For example, to this day I keep having to tell people to never fork the Unreal Engine repository, instead making a manual copy, just in case.
This causes lots of problems for repositories that are private with the expectation that companies will make private forks with their own private changes.
Someone once pushed a bunch of console SDKs (under strict NDA) to a private fork without knowing this. Now that code is just there, if you can guess the commit hash, forever. Literally nothing can be done to remove it. Great.
companies vary wildly in their honesty and cooperation with bug bounties and develop reputations as a result. if they have a shit reputation, people stop doing free work for them and instead focus on more honest companies
Not all free work is wanted. Discouraging frivolous reports is exactly what is being accomplished by not paying for them.
It's not just GitHub and it's not just because they don't want to pay bug hunters. In my career, I have escalated multiple bugs to my employer(s) in which the response was 'working as intended'. And they wouldn't have to pay me another cent if they acknowledged the issue.
In my experience, there was two reasons for this behavior:
1. They don't want to spin dev cycles on something that isn't directly related to revenue (e.g. security) 2. Developers don't have the same mindset as someone who's whole job is security. So they think something is fine when it's really not.
For moral reasons, historically I never wrote POCs or threatened disclosure.
For companies like Microsoft, which a CSRB audit showed that their security culture 'inadequate', the risk of disclosure with a POC is about the only tool we have to enforce their side of the Shared Responsibility Model.
Even the largest IT spender in the world, the US government has moved more from the carrot to the stick model. If they have to do it so do we.
Unfortunately as publishing a 'bad practices' list by us doesn't invoke the risk of EULA busting gross negligence claims, responsible disclosure is one of the few tools we have.
The issue had been reported at least twice and was clearly documented. GitHub knew about this and had known for years. Their replies to the two notifications were even very similar.
GitHub clearly knew. Would you prefer that a vendor lie?
I didn't find anything mentioning it online at the time. But there wasn't much time and dedication involved either, to be fair. I discovered it completely on accident when I combined a commit hash from my local client with the wrong repository url and it ended up working.
There seems to be no such thing as a "private fork" on GitHub in 2024 [1]:
[1] https://docs.github.com/en/pull-requests/collaborating-with-...
A fork of a private repo is private. When you make the original repo public, the fork is still a private repo, but the commits can now be accessed by hash.
According to the screenshot in the documentation, though, new commits made to the fork will not be accessible by hash. So private feature branches in forks may be accessible via the upstream that was changed to public, if those branches existed at the time the upstream's visibility changed, but new feature branches made after that time won't be accessible.
OK but say a company has a private, closed source internal tool, and they want to open-source some part of it. They fork it and start working on cleaning up the history to make it publishable.
After some changes which include deleting sensitive information and proprietary code, and squashing all the history to one commit, they change the repo to public.
According to this article, any commit on either repo which was made before the 2nd repo was made public, can still be accessed on the public repo.
I know this might look like a valid approach on the first glance but... it is stupid for anyone who knows how git or GitHub API works? Remote (GitHub's) reflog is not GC'd immediately, you can try to get commit hashes from events history via API, and then try to get commits from reflog.
You need to know how git works and GitHub's API. I would say I have a pretty good understanding about how (local) git works internally, but was deeply surprised about GitHub's brute-forceable short commit IDs and the existence of a public log of all reflog activity [1].
When the article said "You might think you’re protected by needing to know the commit hash. You’re not. The hash is discoverable. More on that later." I was not able to deduce what would come later. Meanwhile, data access by hash seemed like a non-issue to me – how would you compute the hash without having the data in the first place? Checking that a certain file exists in a private branch might be an information disclosure, but gi not usually problematic.
And in any case, GitHub has grown so far away from its roots as a simple git hoster that implicit expectations change as well. If I self-host my git repository, my mental model is very close to git internals. If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights. I mean, GitHub organizations are not a git concept.
[1] https://www.gharchive.org/
No; just knowing how git works is enough to understand that force-pushing squashed commits or removing branches on remote will not necessarily remove the actual data on remote.
GitHub API (or just using the web UI) only makes these features more obvious. For example, you can find and check commit referenced in MR comments even if it was force-pushed away.
Short commit IDs are not GitHub feature, they are git feature.
Have you ever tried to make private GitHub repository public? There is a clear warning that code, logs and activity history will become public. Maybe they should include additional clause about forks there.
Yes, even though I expect there to be people that do exactly what the GP describes, if you know git it has severe "do not do that!" vibes.
Do not squash your commits and make the repository public. Instead, make a new repository and add the code there.
Why not just create a new public repo and copy all of the source code that you want to it?
Chat gpt given the following repo, create a plausible perfect commit history to create this repository.
Not through the GitHub interface, no. But you can copy all files in a repository and create a new repository. IIRC there's a way to retain the history via this process as well.
All you should have to do is just clone the repo locally and then create a blank GitHub repository, set it as the/a remote and push to it.
You can create a private repository on GitHub, clone it locally, add the repo being "forked" from as a separate git remote (I usually call this one "upstream" and my "fork", well, "fork"), fetch and pull from upstream, then push to fork.
That's not the GitHub concept / almost trademark of "fork" anymore though, which is what your parent was talking about
I mean it's git, just git init, git remote add for origin and upstream, origin pointing to your private, git fetch upstream, git push to origin.
That’s beside the point. The article is specifically about « GitHub forks » and their shortcomings. It’s unrelated to pushing to distinct repositories not magically ´linked’ by the GH « fork feature ».
Am I the only one who finds this conceptually confusing?
Nope, me too. The whole Repo network thing is not User facing at all. It is an internal thing at GitHub to allow easier pull requests between repo's. But it isn't a concept git knows, and it doesn't affect GitHub users at all except for this one weird thing.
I may be recalling incorrectly but I seem to remember it having some storage deduplication benefits on the backend.
Funnily enough the docs are wrong, the GitHub CLI allows changing a forks visibility https://stackoverflow.com/a/78094654/12846952
What does "private fork" mean in this context? I created a fork of a project by cloning it to my own machine and set origin to an empty private repository on GitHub. I manually merge upstream changes on my machine.
Is my repository accessible?
Because you never git pushed to the fork it's not aware of your repo, you're ok.
What I don't know is if in 3 months you DO set your remote origin to that fork to for instance, pull upstream patches into your private repo, you're still not pushing, only pulling, so I would THINK they'd still never get your changes, but I don't know if git does some sort of log sync when you do a pull as well.
Maybe that would wind up having the commit hash available.
It’s not. The feature here works because a network of forks known by GitHub has a unified storage, that’s what makes things like PRs work transparently and keep working if you delete the fork (kinda, it closes the PR but the contents don’t change).
then it's fine
the issue is the `fork` mechanism of github is not semantically like a `git clone`
it's more like creating a larger git repo in which all forks weather private or not are contained and which doesn't properly implement access management (at least point 2&3 wouldn't be an issue if they did)
there are also some implications form point 1 that forks do in some way infer with gc-ing orphan commits (e.g. the non synced commits in he deleted repo in point 1) at least that should be a bug IMHO one which also costs them storage
(also to be clear for me 2&3 are security vulnerabilities no matter if they are classified as intended behavior)
No, that would be the "copy the repository" approach. Private fork is when you do it through their UI.
As far as I know, it is not accessible.
I reported a different security issue to github, and they responded the same (although they ultimately ended up fixing it when I told them I was going to blog about the "intended behavior").
What "intended behaviour" was that, specifically?
Did you end up getting a bug bounty out of it?
It would not even be that hard to fix it; private forks should always just be automatically copied on first write. You might lose your little link to the original repo, but that's not as bad as unintentionally exposing all your future content.
Yup, we can close the thread and ack that GitHub does not care.
To be fair, in the true git sense, if a "fork" is really just a branch, deleting the original completely would also mean deleting every branch (fork) completely
obviously not a fan of this policy though
But a fork is really not a branch. it’s a copy of a repo with one remote pointing at the original on github but that doesn’t need to happen.
My conclusion would be: don’t use GitHub.
Imho there is an issue with the word "delete". Apparently for anyone who is hosting someone else's (private and/or sensitive and/or worthy) data is to hide it from view, but keep it around "just in case" or "because we can" or "what are you gonna do about it"?
I 'love' it when I see the words "hide", "archive", "remove", and other newspeak to avoid using the word "delete", since 'they' never actually delete (plus there are 1-2-5-10-forever years' of backups where your 'deleted' info can be retrieved relatively easy).