The key to this attack is: "The result of these settings is that, by default, any repository contributor can execute code on the self-hosted runner by submitting a malicious PR."
Problem: you need to be a "contributor" to the repo for your PR to trigger workflows without someone approving them first.
So: "We needed to be a contributor to the PyTorch repository to execute workflows, but we didn’t feel like spending time adding features to PyTorch. Instead, we found a typo in a markdown file and submitted a fix."
I really don't like this aspect of GitHub that people who have submitted a typo fix gain additional privileges on the repo by default. That's something GitHub can fix: I think "this user gets to trigger PRs without approval in the future" should be an active button repo administrators need to click, maybe in the PR flow there could be "Approve this run" and "Approve this run and all future runs by user X" buttons.
The vast majority of repos should be able to run CI on pull requests with no privileges at all. GitHub can manage any resource utilization issues on their end.
Is the issue here that a self-hosted runner was needed for some hardware tests?
The problem is that there are fundamentally 2 different kinds of builds, but the current tooling is weak:
* pre-merge builds on PRs. These should not have privileges, but making the distinction between the two cases requires a lot of care.
* official builds of "master" or a feature branch from the main repo. These "need" privileges to upload the resulting artifacts somewhere. Of course, if all it did was wake up some daemon elsewhere, which could download straight from the CI in a verified way based on the CI's notion of the job name, it would be secure without privileges, but most CI systems don't want to preserve huge artifacts, and maintaining the separate daemon is also annoying.
It’s called GitHub secrets.
Builds off of main get the secrets, pull requests from randos don’t.
And public repos don’t pay for CI on GitHub.
Not rocket science, people.
Pytorch did use GH secrets for the valuables and you can see that this wasn't enough, right there in the OP, because the self-hosted runners are still shared
Yup! This is what makes this kind of attack scary and very unique to GitHub Actions. The baseline GITHUB_TOKEN just blows the door open on lateral movement via workflow_dispatch and and repository_dispatch events.
In several of our other operations, not just PyTorch, we leveraged workflow_dispatch to steal a PAT from another workflows. Developers tend to over-provision PATs so often. More often than not we'd end up with a PAT that has all scopes checked and org admin permissions. With that one could clean out all of the secrets from an organization in minutes using automated tools such as https://github.com/praetorian-inc/gato.
To clarify, this secret stealing is not an issue with GitHub-hosted runners, correct?
Correct. For fork PR workflows on the pull_request trigger the GITHUB_TOKEN has read only permissions, so you can’t do anything with it.
The key thing with a non-ephemeral runner is that (after obtaining persistence) you can grab the GITHUB_TOKEN from a subsequent non-fork PR build or a build on another trigger, which will have write permissions unless restricted by the repository maintainers.
I think 'environments' was meant, where diff GHA environments get diff secrets, and policies dictate who gets to run what actions with what envs.
But that is real work to setup, audit, and maintain. It'd be better if, like phone app capabilities, the default would be no privs, any privs are explicitly granted, and if they aren't being used, the system detects that and asks if you want to remove specific ones.
If they use the same runners, couldn’t the attacker just wait? The runners would need to be sequestered too
Absolutely. The real difficulty is tests on PR are by definition remote code execution by an untrusted source, so a full risk analysis and hardening needs to be done.
Here's a similar mistake on an OSS repo a company I worked for made: https://goteleport.com/blog/hack-via-pull-request/
The smugness and overconfidence of someone who's about to be pwned?
Pwn me.
For the second point, PyPI's trusted publisher implementation does this very well: https://docs.pypi.org/trusted-publishers/
Isn't that what's causing the problem?
Without that there would be no need to have an action to do an upload. It could be comfortably and safely done offline.
Ideally the builds for external PRs should not gain any access to any secret. But it is not practical. For example, you may want to use docker containers. Then you will need to download docker images from somewhere. For example, downloading ubuntu:20.04 from docker hub. That one requires a token, otherwise your request may get throttled. Even accessing most Github services requires a token. I agree with you that these things require a lot of care. In reality most dev teams are not willing to put enough time on securing their build system. That's why supply chain attacks are so common today.
Only because pypi removed support for gpg and their only "signature" now basically is "was uploaded from a gh action".
Otherwise a maintainer could do it offline without needing to give microsoft total access to uploading to pypi.
When there are no side effects and no in-container secrets and the hosting is free or reasonably limited to prevent abusers, ideally yes.
Outside that, heck no, that'd be crazy. You're allowing randos to run arbitrary code on your budget. Locking it down until it's reviewed is like step 1, they can validate locally until then.
GH actions are free on public repos.
Until someone tries to mine Bitcoin in yours, yeah
Do you have a case study?
My prior is that GitHub is pretty competent in understanding malicious PRs to open source repos, and wouldn’t penalize the repo owner without other evidence of wrongdoing.
They are so difficult. I wanted to stop random people to run code on my repository… I don't have any secrets or write access or anything to exploit. Just to avoid burning quota.
The issue is that now the pull requests don't get tested at all. I have to manually, locally, get all the commits, make a branch on the main repository with them, and then the actions run.
That just plain sounds Bad™, yeah. In the buildkite setup I interact with, I can at least just hit the "continue" button on the paused pipeline, regardless of any other GitHub state... which feels like table stakes to me.
Self-hosted runners is the way to go, IMHO. Especially if you have bare metal resources. I love how fast my builds are with 16 cores, and gobs of ram.
What's the GitHub Actions tooling like for emphemeral self-hosted runners?
Afaict, a huge portion of this attack came from persistence on the self-hosted runner.
Absent that, they would have needed a container jailbreak as well, which substantially ups the difficulty.
And if a repo is running <100 builds a day, spin up + kill container seems a small per-build price to pay for the additional security isolation.
GitHub themselves don't seem to provide any mechanism to make runners ephemeral. It looks like all they allow you to do is flag a runner as ephemeral, meaning it will be de-registered once a job is completed - you need to write your own tooling to wipe it yourself (either via starting a whole new runner in a new environment and registering that or wiping the existing runner and re-registering it).
https://docs.github.com/en/actions/hosting-your-own-runners/...
there are 3rd party foss options (1):
1. ephemeral + zero implicit trust (2) https://blog.openziti.io/my-intern-assignment-call-a-dark-we...
2. zero implicit trust: https://github.com/openziti/ziti-webhook-action
(1) disclosure, maintainer (2) zero implicit trust in this case = no open inbound ports on underlay; need to access via app-specific overlay which requires strong identity, authN, authZ
I've just made runs-on [1] for that purpose: self-hosted, ephemeral runners for GitHub Action workflows. Long-running self-hosted runners are simply too risky if your project is public.
[1]: https://runs-on.com
What privileges do they gain? Anything other than the one you mentioned?
This was definitely not obvious to me, so thanks for bringing it up!
This is more subtle, but there is an “author_association”field within Actions event contexts that can be one of:
NONE, CONTRIBUTOR, COLLABORATOR, MEMBER, OWNER
There are some cases where people use checks for that as part of gating for workflows that run on pull_request_target/issue_comment, but might confuse contributor and collaborator (which requires explicitly adding someone to the repository). Ultimately this is a misconfiguration on part of the maintainer but another example where fixing a typo can play a part in an attack.
Thanks, that makes more sense than automatically granting privileges. It definitely seems easy to mix up those two terms, however!
They do care about this stuff though - a while ago I pointed out that `pr_closed` was run with the workflows from the PR(^); it was fixed promptly.
^So you could reject a PR that tries to add a bitcoin-mining workflow only to have it start bitcoin-mining on close. Or leaking secrets, releasing nonsense, etc. (I don't recall if non-/contributor status was also a factor. I think it was also ignoring that check.)