I'm a maintainer (one of many) of an open source project, and this topic has been on my mind a lot lately as I review PRs.
I am more suspicious of PRs from new contributors by default now. Of course I keep these suspicions to myself, but besides simply reviewing code for all the regular things, I now ask myself "what sort of sneaky thing could they be doing that appears benign on the surface?"
That's great that you are considering this more now.
But the xy story taught us, that every contributor is dangerous, the most dangerous ones are probably the most helpful and most skilled contributors. If someone barely get's a PR accepted, they probably lack the skills to add a sophisticated backdoor.
Another thing that was not talked about a lot: There are many ways to compromise existing maintainers. Compromising people is the core competency of intelligence, happens all the time, and most cases probably never come to public knowledge.
I'm sorry, but this is just fear-mongering with no basis in reality. So we've had what, two incidents (xz and eventstream) in how many years? And now maybe possibly perhaps a third attempt? Well whoop-die-doo. It's not evidence of anything other than the process not being watertight, but no one reasonable ever through that in the first place.
Every human being you meet on the street can stab you in the eye. Literally nothing stopping anyone from doing that. You need to separate what can happen from what actually happens.
If someone gets stabbed in the eye, we find out about it. So our statistics on eye-stabbing are probably accurate.
We literally have no idea how many xz-style compromises are out there in the wild. We got really lucky with xz - it was only found because the backdoor was sloppy with performance and a microsoft employee got curious. But we have no data on all the times we got unlucky. How many packages in the linux ecosystem are compromised in this way? Maybe none? Maybe lots? We just don't know.
You can always use the "we have no idea" argument because you can't prove something doesn't exist. Go find evidence. It's been over a month since xz and thus far we have zero additional incidents. And if you look at the specifics of xz attack: that wouldn't work for most projects because most don't have binary test files.
Are people really looking though? Are all open source libraries being run through extensive performance profiling to look for known heuristics? Are they being looked at line by line for aberrations?
I don’t have confidence that people are looking for evidence of potential exploitation because of reasons like the ones you bring up.
So we’re back to we just don’t know.
With hindsight it's not the runtime behaviour of the library that you'd want to test - the weakest point in the chain is where the distributed source .tar.gz can't be regenerated from the project repository.
For how many projects is that actually checked? I bet barely any.
Its especially difficult because most projects aren't built in a reproducible way. You should be able to uncompress and compare a source tarball. But if you get a binary and the source code used to generated that binary, there's no way to tell that they match.
Luckily the source tarball is the more important one to check, because that's the difference between backdooring one distribution and backdooring them all.
It's still not trivial because there might well be legitimate processing steps that are used to create the tarball, but it should be doable.
Most commonly-used projects are watched by a bunch of people, or diffed on updates. These are not in-depth reviews, but should catch most of it. So yes, people are looking, and have been looking for a long time.
The reason Jia Tan could do their thing is because 1) the main meat was in a binary test file, 2) the code to use that seemed relatively harmless at a glance, and 3) people were encouraged to use the .tar.gz files instead of git clone. Also you need to actual get maintainer status, which is not as easy as it sounds.
I've been thinking of inserting a "// THIS LINE IS MALICIOUS, PLEASE REPORT IF YOU SEE IT" in some of my projects to see how long it would take. I bet it would be pretty fast either after commit or after tagging a release.
Tools that use LLMs to review code will catch such projects.
Maybe
It’s worse than that, and that wouldn’t be enough.
A large class of exploitation methods simply have no performance impact.
I'm nobody so you have no reason to believe me - but there have indeed been other, very prominent projects targeted in very similar attacks. We're still inside the responsible disclosure window.. hell, even in the blog post we're commenting on, three JS projects were targeted in failed attempts. That's 4 public projects now..
Suspected to be targetted, in a way that seems to have 0% chance of succeeding for almost any project. Which is why nothing happened.
Its obviously more than 0% given xz was successfully taken over and backdoored. Even a 5% chance of malicious takeover per project would make the situation pretty worrying given how many well funded, motivated government agencies are out there.
I'm not talking about xz, I'm talking about that OpenJS thing: random people emailing out of the blue "plz gimme maintainer". Entirely different situation.
I did quote the "three JS projects were targeted in failed attempts" bit, which should have made that abundantly clear.
History may record XZ alongside Spectre/Meltdown as industry turning points for "too wide to see".
And xz wasn't the first. Several attempts have been made to put garbage in the kernel.
No. If there is strong incentive to compromise, and little to no chance a compromise is being found, it's statistically most likely to assume compromises happen on a regular basis and only rarely are found out.
It did at least reveal the playbook, and that you have to get pretty creative to hide things in plain sight.
I'm sure any binary blobs in OSS software, no matter what the reason for having them will be viewed with suspicion, and build scripts get extra inspection after that.
Maybe I'm naive in thinking that some people are already looking into packages that are included in all base Linux builds? Including simplifying the build env, and making sure that the the build tools themselves (cmake, pkgconfig, gmake, autotools etc) are also not compromised.
The de facto standard serialization library for Rust, serde, started using binary blobs to speed up builds only a few months before the xz back door was discovered. Lots of people asked the author to include build scripts so they could (re)generate the blobs on their own and his response was basically if you want it, fork it.
We know about the failed attempts, we have no idea about the successful ones, and the ones that are going to be successful in the future.
You can always use this line because you can never prove something doesn't exist. Go find evidence. It's been over a month.
Your choice of language in your comments (in this thread, not in general) isn’t bolstering your argument.
Why not be curious rather than just dismissive? This seems to be people just talking past each other at this point.
There have been a lot of changes in the last ~five years that point in the direction of supply chain security being at greater risk.
Evidence comes in many forms. The relevance of evidence depends on what part of the problem you are looking at.
Also, it is rational to talk about the probability by which different evidence is likely to be surfaced!
I think it is possible you are sensitive to people making such claims for self-interested purposes. Fair? But I don’t think it’s fair to assume that of commenters here.
Yeah, you're probably not wrong. I've had this argument a few times now, and it's the same dismissive "we don't know what we don't know" every time. Well, you can say that for everything and given the complexities of the xz attack that seems a bit unlikely to me, which is then again countered with "but we don't know!!11"
"Every contributor is dangerous" is spectacularly toxic type of attitude. I've already seen random people be made a target and even had their employers contacted over this before they even had a chance to explain(!!) To say nothing of "there are many ways to compromise existing maintainers. Compromising people is the core competency of intelligence, happens all the time" – so great, now I'm also potentially dangerous after spending untold hours and money over the last 20 years because I could be compromised. Great.
This was never a nuanced conversation about risk management to start with. This is not the type of community I've worked for all this time. "Let's use some common-sense tech so this isn't that easy". Sure, let's talk about that. "Let's treat every volunteer involved as potentially hostile and compromised after we've seen a single incident"? Yeah, nah.
Thanks for your thoughtful reply.
I view this from the lens of "How well can people reason about probabilities?" and research has shown, more or less, "not very well". In the short term, therefore, it is wise to tailor communications so as to avoid predictable irrational reactions. In the medium term, we need to _show_ people how to think about these questions rationally, meaning probabilistically.
For what it is worth, I prefer to avoid using the phrase "common sense", as it invites so many failure modes of thinking.
My current attitude is, more or less, "let's put aside generalizations and start talking about probabilities and threat models". This will give us a model that makes _probabilistic predictions_. Models, done well, serve as concrete artifacts we can critique and improve _together_.
I hope to see some responses to my other comment at https://news.ycombinator.com/item?id=40271146 but I admit it takes more effort to share a model. It is well outside the usual interaction pattern here on HN to make a comment with a testable prediction, much less a model for them! Happily, there are online fora that support such norms and expectations, such as LessWrong. But I haven't given up hope on HN, as it seems like many people have the mindset. I think the social interaction pattern here squanders a lot of that individual intelligence, unfortunately... but that pattern can change in a bottom-up fashion as people (more or less) demand, at the very least, clearer explanations.
I'm not quite following the second sentence. What kind of community have you worked for? Do you mean "worked for" as in e.g. "the spirit of your comments on HN"? Or something else?
You have evidence of a state-sponsored attack which was only discovered because we got extremely lucky, and you’re not worried?
The attack itself is the frankly evidence. It’s sort of like how we expect there to be life on other planets because there is life on earth.
You're really going to pretend like there have been no socially-engineered cybersecurity attacks in the last 30 years...?
And by the way, stabbings happen all the time, at least 3 per day. Stabbings hurt a few people, cybersecurity incidents can hurt millions.
This is about "social engineering takeovers of open source projects", not "socially-engineered cybersecurity attack", which is much much broader.
I've been pretty clued up on open source for the last 20 years, and I don't really recall any other similar incidents other than the two I mentioned. I tried to find other examples a few weeks ago and came up empty-handed. It's certainly not common. So please do post specifics if you know of additional incidents, because from what I can see, it's exceedingly rare.
You seem super confident that there have been zero similar attacks that achieved their goals without detection. By definition, almost anyone who pulled off this kind of thing would try really hard not to burn that backdoor by being super obvious (for instance, using it to deface a website). We literally would not know anything about it, in all likelihood. Therefore I feel like it’s a lot more intellectually honest to say we have no idea if that has happened elsewhere, than it is to confidently proclaim that it certainly has not just because it’s been a month since xz.
What I'm argueing against is absolutist fear-mongering statements such as "every contributor is dangerous".
I'm not confident about anything, but anything could happen or have happened all the time. We need to operate on the reality that exists, not the reality that perhaps maybe possibly could perhaps maybe possibly exist. And we certainly shouldn't be treating anyone sending you a patch as a dangerous hostile actors by default.
There are CVEs where an empty string performed an authentication bypass.
> social engineering
The best bugdoors are deniable.
This is specious reasoning.
You're only complaining you only heard of two incidents.
What you're really pointing out is that this attack vector works reliably well and is reproducible across projects.
You're also pointing out that this attack vector will continue to work until something is done to mitigate it.
I really do not understand what point you think you are making.
Yea. It would almost be strange if security service didnt consider the route of getting "kompromat" on a developer to make them "help" them.
I suppose that’s an option, but it also introduces an additional risk of exposure for your operation as it doesn’t always work and makes it much more complicated to manage even when it does work.
Does it matter though? They don’t have to say “I am so and so of the Egyptian intelligence service and would like to blackmail you”
They might not even use blackmail, they might just "help out" in a difficult financial situation. Some people are in severe debt, have a gambling problem, are addicted to expensive drugs, or might need a lot of money for a sick relative. There are many possibilities.
The trick is finding the people that can be compromised.
I think you're going overboard on what's required. Take anybody who is simultaneously offered a substantial monetary incentive (let's say 4 years of total current/vesting comp), and also threatened with the release of something that we'll say is little more than moderately embarrassing. And this dev is being asked to do something that stands basically 0 risks of consequences/exposure for himself due to plausible deniability.
For instance, this is the heartbleed bug: "memcpy(bp, pl, payload);". You're copying (horrible naming conventions) payload bytes from pl to bp, without ensuring that the size of pl is >= payload, so an attacker can trivially get random bytes from memory. Somehow nobody caught one of the most blatant overflow vulnerabilities, even though memcpy calls are likely one of the very first places you'd check for this exact issue. Many people think it was intentional because of this, but obviously there's zero evidence, because it's basically impossible for evidence for this to exist. And so accordingly there were also 0 direct consequences, besides being in the spotlight for a few minutes and have a bunch of people ask him how it felt to be responsible for such a huge exploit. "It was a simple programming mistake" ad infinitum.
So, in this context - who's going to say no? If any group, criminal or national, wanted to corrupt people - I really don't think it'd be hard at all. Mixing the carrot and the stick really changes the dynamics vs a basic blackmail thing where it's exclusively a personal loss (and with no guarantee that the criminal won't come back in 3 months to do it again). To me, the fact we've basically never had anybody come forward claiming they were a victim of such an effort means that no agency (or criminal organization) anywhere has ever tried this, or that it works essentially 100% of the time.
This doesn't look intentional at all, because this is basically like how 90% of memory disclosure bugs look
Absolutely. And that's the point I'm making here. It is essentially impossible to discern between an exploit injected due to malice, and one injected due to incompetence. It reminds one of the CIA's 'simple sabotage field manual' in this regard. [1] Many of the suggestions look basically like a synopses of Dilbert sketches, written about 50 years before Dilbert, because they all happen, completely naturally, at essentially any organization. The manual itself even refers to its suggestions as "purposeful stupidity." You're basically exploiting Hanlon's Razor.
[1] - https://www.openculture.com/2015/12/simple-sabotage-field-ma...
I suppose the point is that even though any given instance of an error like this is overwhelmingly likely to be an innocent mistake, there is some significant probability that one or two such instances were introduced deliberately with plausible deniability. Although this amounts to little more than the claim that "sneaky people might be doing shady things, for all we know", which is true in most walks of life.
Nobody can tell if they are intentional or accidental.
If the target knows or suspects what you’re asking them to do is nefarious then you still run the same risks that they talk before your operation is complete. It’s still far less risky to avoid tipping anyone else off and just slip a trusted asset into a project.
No, but practically by definition the target has to know they’re being forced to “help” and therefore know someone is targeting the project. Some percentage of the time the target comes clean about whatever compromising information was gathered about them, which then potentially alerts the project to the fact they’re being targeted. When it does work you have to keep their mouth shut long enough for your operation to succeed which might mean they have an unfortunate accident, which introduces more risks, or you have to monitor them for the duration which ties up resources. It’s way simpler just to insert a trusted asset into a project.
I would guess there are many projects they could target at any given time.
The more projects they target the more risk of being flagged and preventive measures to be engaged by counter intelligence etc.
I’m reading all this with sadness realizing that one of the Internet’s last remaining high trust spaces is being destroyed.
> one of the Internet’s last remaining high trust spaces is being destroyed
One of the Internet's last remaining high trust spaces is being attacked.
What happens next is still unwritten.
From what I know of today's developer culture the solution will be for one company, probably Microsoft given their ownership of GitHub, to step in and become undisputed king and single point of failure for all open source development. Developers will say this is great and will happily invite this, with security people repeating mantras about how securing things is "hard" and "Microsoft has more security personnel than we do." Then MS will own the whole ecosystem. Anyone objecting will be called old or a paranoid nut. "This is how we do things now."
As an positive counterexample, US recently reduced federal funding for the program which manages CVEs [1]. There was/is risk of CVE data becoming pay-for-play, but OSS developers have also pushed for decentralization [2]. A recent announcement is moving in the right direction, https://medium.com/@cve_program/new-cve-record-format-enable...
> solution will be for one company, probably Microsoft given their ownership of GitHub, to step in and become undisputed king and single point of failure for all open source development.A single vendor solution would be unacceptable to peer competitors who also depend on open-source software. A single-foundation (like LF) solution would also be sub-optimal, but at least it would be multi-vendor. Long term, we'll need a decentralized protocol for collaborative development, perhaps derived from social media protocols which support competing sources of moderation and annotation.
In the meantime, one way to decentralize Github's social features is to use the GH CLI to continually export community content (e.g. issue history) as text that can be committed to a git repository for replication. Supply chain security and identity metadata can be then be layered onto collaboration data.
[1] https://www.darkreading.com/vulnerabilities-threats/nist-nee...
[2] https://github.com/yoctoproject/cve-cna-open-letter/blob/mai...
The most secure systems are those that are also resistant to rubber hose cryptography.
"Rubber Hose Cryptography" comes in the form of a PR.
"Rubber Hose Cryptanalysis" comes in the back door and waits for you in the dark.
No, it 'comes in the form of' a rubber hose...
They would be really bad at their job, if they didn't try.
Developers would be really bad at their newly expanded job, if they didn't resist.
Unforuntately it's easy to sandbag being dumb. Just because someone submits a PR defining constants for 0-999 does not mean they're actually bad at programming.
That person might just be an old school Java <5 developer.
That person might just be a regular Java developer who works on a project which onboarded Checkstyle, and can't disable it's MagicNumber check.
https://checkstyle.sourceforge.io/checks/coding/magicnumber....
Man, I hate such tools. Do I run into problems when I try to convert seconds to minutes?
Larger problem than magic numbers ever could be.
Oh, now I get why you need those constants:
How incredibly wasteful.
You can form most useful numbers with just ten singe-digit constants, some casting, and string concatenation.
Indeed, in python you can just eval.
Sure, but being known for submitting bad code is going to make code reviews more thorough, not less. It's drawing additional attention to yourself.
One follow up to compromising existing maintainers: This makes the creators or long-term good faith maintainers maybe even more "dangerous" than new maintainers.
Are we facing a Byzantine generals kind of situation now?
We have always faced it, it’s just that there's more awareness of the potential issues.
Also not talked about a lot - there are many ways to compromise existing software engineers who are paid to work on proprietary software systems.
It is a worthwhile reminder.
But, source-not-available proprietary systems are just totally hopeless from this point of view, of course an intelligence agency could slip something on. A bored developer at the company could too. Users of this sort of proprietary system have just chosen to have 100% faith for some incomprehensible reason.
That's true, but it's also true that a sophisticated and well formed PR is probably genuine too. Hostile PRs are the exception rather than the rule. And if only the high quality PRs are treated with suspicion, then the attackers will tailor their approach to mimic novices. General vigilance is required, but failure is likely because these attacks are so rare that maintainers will grow weary of being paranoid about a threat they've never seen in years of suspicion and let their guard down.
Early this year, I've received a hostile PR for a "maintenance only" JavaScript authentication library with less than 100 stars but which is actively used by my employer.
It added a "kinda useful but not really needed" feature and removed an unrelated line of code, thereby introducing a minor security vulnerability.
My suspicion is that these low quality PRs are similar to the intentional typos in spam emails: Identify projects/ maintainers who are sloppy/ gullible enough and start getting a foot in the door.
Who can share a threat model with specific probability estimates on this? FWIW, I’m less interested in the particular estimates (priors) and more interested in the structure.
Honestly, a good PR should have a very clear description of the idea and a sample implementation, and then a trusted core contributor re-implements the fix on his own. But Github users are entitled and spoiled by Github-marketed commercial software, so they will rage at this.
Sounds like you're describing an issue, not a PR.
An issue usually doesn’t have the code for implementation of the solution. Yes, very often patches are attached in comments, but they are not required and usually attached by other people, not the author.
It's not the new contributors you have to watch, it's the sleeper contributor who has built up a solid reputation and then is "activated". At least that's how I understand XZ.
"Trusted account for sale"
It’s both. The fact that one happened recently does not preclude the other.
Wasn't a key thing of the xz attack vector that people where encouraged to download the custom source release instead of the autogenerated Github one? I don't know if that is a pattern but it seems like best practices in the (source) supply-chain could prevent a large class of these attacks.
That is unfortunately how `the `autotools` ecosystem works; although I guess projects could guide their users to run `autoreconf -i` if working with the source code instead of the release tarballs before doing the usual `./configure && make && make install` step.
yep.
same with npm. i publish releases of my OSS libs to npm, but there's no guarantee that what is uploaded is what you see on github. that's a lot of trust you have to put into my opsec, etc. not good.
Why is there not a policy that any PR can be rewritten by a maintainer ? Wherever the PR looks a bit odd, rewrite it so do the same thing a different way. Enough unpredictable change to disrupt finely-tuned subterfuge.
you can wait for tree (or x) PR's passing specified unit tests for functionality and then merge a random one. But this is a luxury (effort wise) for any kind of project.
I’m in it for a free t-shirt.
https://medium.com/pentesternepal/hacking-dutch-government-f...
Looking at some of these cases, each PR on their own doesn’t look suspicious, but it was what they all built up to — in some cases from multiple bad actor contributors that, on the surface, weren’t connected.
the attitude remindes me of maintaining game-servers and looking out for cheaters; once we had a handful of folks looking out for cheaters, it turned the community against itself calling everybody a cheater... i think it is good to be cautious; but overall it's the same cat and mouse game we've seen before. i can only say good luck on not letting it stress you out second guessing other folks actions and intent - and hope we continue writing code for humans to read vs the cryptic, obstrufcated, even "elegant" code (not to dive into the skill issue rabbit hole lol)
This should be the mindset for any commit. I mean, they could add something benign unknowingly. Something could hack their account, or abuse a flaw in your commit-system to appear as someone else. They could have a melt-down, or strange ideas and adding something for nonsical reasons. Shit can happen all the time from all directions for any reason.
This is what I feared the most. Trust issues that lead to less progress and a community that slowly drowns in suspicions. When the XZ thing happened they were already going after one person accusing them of being part of the whole thing in one of the bug reports on github.
what do you look for to prevent this?