My personal takeaways from this:
1. Source distribution tarballs that contain code different from what's in the source repository are bad, we should move away from them. The other big supply chan attack (event-stream) also took advantage of something similar.
1a. As a consequence of (1) autogenerated artifacts should always be committed.
2. Autogenerated artifacts that everyone pagedowns over during code reviews is a problem. If you have this type of stuff in your repository also have an automatic test that checks that nobody tampered with it (it will also keep you from having stale autogenerated files in your repository).
3. A corollary of (1) and (2) is that autotools is bad and the autotools culture is bad.
4. Libsystemd is a problem for the ecosystem. People get dismissed as systemd haters for pointing this out but it's big, complicated, has a lot of dependencies and most programs use a tiny fraction of it. Encouraging every service to depend on it for initialization notifications is insane.
5. In general there's a culture that code reuse is always good, that depending on large libraries for small amounts of functionality is good. This is not true, dependencies are maintenance burden and a security risk, this needs to be weighted against the functionality they bring in.
6. Distro maintainers applying substantial patches to packages is a problem, it creates widely used de facto forks for libraries and applications that do not have real maintainers looking at them.
7. We need to make OSS work from the financial point of view for developers. Liblzma and xz-utils probably have tens of millions of install but a single maintainer with mental health problems.
8. This sucks to say, but code reviews and handing off maintainership, at the moment, need to take into account geopolitical considerations.
That won't help. There's no evidence that Jia Tan is a real name, or even a real person for that matter. If projects stop accepting contributions from asian-sounding names, the next attack will just use Richard Jones as a name.
I think you could interpret this as "you need to know, personally, the party you're handing this off to, and make reasonable judgments as to whether or not they could be easily compromised by bad actors".
Like, meeting someone at several dev conferences should be a requirement at the very least.
This is utterly and completely unfeasible. Most open source maintainers, especially those that are struggling and are pressured to hand-off maintenance, don't have the time, means and will to travel to meet up with prospective co-maintainers, not just once but multiple times.
In practice it would just result in projects getting abandoned, the prospective co-maintainer starting a fork, and everyone switching to use the fork.
Really, it starts before things get bad. This thing where - in the famous XKCD example - a single guy is thanklessly maintaining a project for 20 years in Nebraska needs to stop. Software libraries like these are no longer a one-person job. They can't be for the bus factor alone. Major projects like Linux distros or bigger foundations like Apache or Mozilla need to start harping on people hard to contribute to important libraries. We'll get to whatever the buzzword of the day is once we do the important work first.
Find a way to make it happen. Sorry, "I just can't" isn't going to cut it after this.
What bigger foundations? Apache foundation has yearly revenue $2.1 million. Why do you think they reacted as they reacted to log4j? There are no resources.
Open source is running on fumes.
pretty much this.
That's why for whatever anyone thinks of Theo's antics, I appreciated the OpenSSL/LibreSSL Valhalla blogs and overall effort to do something about it.
TBH I'm amazed in it's current state that Apache took in Pekko(FKA JVM Akka...), part of me is guessing it's because some of their other infra is dependent on it...
Foundation based OSS is on fumes. Open core... I am still hopeful for on many levels.
I agree we need to stop depending on the 20-year hobby project of the guy in Nebraska, but adding barriers (which requiring travel and in-person meetings is) to sharing the load is not the solution. What these projects need is the necessary resources (mostly money) for multiple people to work on it professionally.
What I don't understand is, where are all the code and security contributions from Big N and other multi-billion-dollar international scale and users? Do they all have their own internal fork of every major library? If not, you would think that they would be their own financial interest to keep somebody on payroll to maintain fundamental libraries like this.
Is the argument that well known software should be taken over by professionals? There are many motivated software maintainers, including single guys in Nebraska, who have better operational security than well funded companies.
Remember the recent incident with the signing keys at Microsoft? Or the one before that? And these are the biggest, most well funded, companies on Earth we are talking about.
Organizations such as Let's Encrypt work well because they are staffed with motivated and competent people, not because they are well funded. This is not a problem that can be solved with funding alone.
I think this is a really important point. Every commercial contract I've been involved in has clauses that are intended to mitigate supplier risk, eg that they go out of business, and the contracts people do due diligence on suppliers to vet that they are who they say, try to eliminate the one-person operations, and generally mitigate risk if they really need the code but the only supplier is a tiny startup.
Perhaps large corpos need to apply their standard risk mitigation lens to their supply chain. Their stack or their security depends on these 390 packages. 27 of them have less than 3 maintainers. Recommendation: find alternatives.
You could probably get about 80% of the job done with just 20% of the work. I’m not in OSS but I hire technical people remotely, and I know many people that do. Some consultant friends have caught blatant scams within the first couple of rounds of interviews on Zoom.
>I’m not in OSS but I hire technical people remotely, [...] interviews on Zoom.
Different situations with different incentives and psychology:
- potential to receive money : job candidates are willing to get on Zoom calls or meet in person because they want a paycheck.
- no money involved & volunteer for free : potential open source contributors are not interested in getting on Zoom calls for $0 pay.
That's when you dangle a grant in front of them.
Also, while I'm not a lawyer, and in general the bar is very high to criminally prosecute an employee for doing "a bad job," I wouldn't be surprised if there are major jurisdictions where intentionally backdooring your employer's code could land you in prison.
Interviewing is a different situation though, because you start having essentially no relationship with the interviewee, and you haven't seen their work. OSS projects don't just add everyone that asks as a co-maintainer. Usually it's someone that has contributed to the project for a while already, and through that has shown they understand the code, are capable of improving it, and can make sensible decisions about the road to take.
Yea, I get that, from what I see in the details of this case the contributor was very competent. What I'm questioning here is whether in OSS projects there is enough face-to-face communication, probing about values, etc.
Intelligence agencies mastered fooling people about their bona fides in person a long time ago. Meeting someone in person will stop casual individuals who just want to crash the project for the lols or some other personal-level reason, but it would have been barely a bump in the road for this clearly-nation-state-level attack.
It adds another layer of complexity, though, and when someone trips up (and eventually, they will), it lets us know who is doing what and why. It also adds another layer of expense and vulnerability. Part of the beauty of cyberattacks for intelligence agencies is that they are very light on tradecraft. This helps to reduce that advantage.
We actually know who poisoned Alexander Litvinenko and what they're up to today, for example.[0]
[0]https://www.bbc.com/news/uk-35370621
I don't think it's crazy for a maintainer to Google the person a bit, and if there is no positive match, ask the other person for at least a little bit of detail about themselves, like where they live (country/city), who they work for, etc. Maybe hop on a phone call or something.
In this case, Jia Tan just doesn't seem to match any real person we can find online. It's not like there's an elaborate online persona that they have really built.
While I don't want to put Lasse Collin on trial since he's a victim too, I do think he owes the community an update and explanation of what went down. It's not because we want to point fingers at him, but to learn from the experience.
1) Not everyone (current and potential future maintainers) has the time to go to dev conferences.
2) Simply meeting IRL is a terrible proxy for credibility.
Disagree; trust your intuition, but you can never do that if you never meet IRL.
Also, it's not racist or xenophobic to recognize that some countries exercise nearly complete control over their citizens (and sometimes indirectly over non-citizens), and that those people could be putting themselves at extreme personal risk by disobeying those dictates (assuming they did disagree, which doesn't seem to be a given)
I'm sure Edward Snowden also met up with colleagues in the office at least a few times. May have even passed a security clearance.
Hold up, where did I make this claim about national origin/external pressure?
I'm only suggesting if you have pets, a kid, or a project at work, conferences take a non-zero amount of time to plan to attend.
Plus, what conference options even exist if you're finding other people for the xz library? Searching for #CompressionConf2024 isn't turning up much.
And that's why we know who Edward Snowden is. That's more than we can say about Jia Tan.
Say what you will about what he did and why, it is going to be very, very hard for someone to explain to a contract's security auditor why, in the year 2024, a commit from an account known to belong to Edward Snowden is in the source code of security-critical software.
And that's what FOSS-based companies and orgs need to start doing after this. If I'm working for Debian/Mozilla/Apache/wherever, I'm going to start asking project maintainers more about who they are. "Hey man, we've got an all-expenses-paid trip to one of the major conferences this year, which one can we put you down for?" needs to come out of someone's mouth at some point, and excluding some very good reasons and evidence for why they can't appear at one of these events in-person (think health or long-term family obligation reasons, confirmed by multiple people who know the maintainer), they need to be at one or more meetings within a reasonable amount of time. Randomly-timed remote video meetings could work in a pinch.
If they can't after a couple of years, then these projects need to inform the maintainers that they'll be forking the project and putting it under a maintainer who can be verified as a living, breathing, single person.
Repeat until there's at least some idea of who's working on most of these projects that make up critical systems that society is built upon.
Let's use the current theory that this is a state sponsored attack. If that's the case, another Jia Tan will be recruited. The identity of a single person simply doesn't matter. All that matters is that the attack was attempted.
Consider the issue of candidates who lie in the interviewing process by hiring other people to interview on their behalf. Now replace "interview" with "attend conference". This is just adding another vector of blind trust waiting to be abused.
It raises the bar quite a bit.
Especially when you've met Jia Tan and the new Jia Tan is obviously not the same person.
Meeting in person is quite literally the opposite of blind trust. Blind trust would be assuming that the person physically sitting on the other end of the internet connection and controlling Jia Tan's keys is the same Jia Tan you had lunch with a few months ago.
You cannot prove the person behind the keyboard is the person who is meeting up with people.
This is blind trust because of the assumption that the person is the same.
This is true even when they are no longer in that country. Some governments are known to threaten the family of expatriates. "Do this for us or mom and dad are going to spend the rest of their soon to be short lives doing hard labor" is a pretty tough threat to ignore.
What would be better?
It's naive to believe that any form of physical presence means someone isn't going to do something nefarious in the eyes of the project.
This problem can only be solved by more skilled eyes on the projects that we rely on. How do we get there? shrug.gif.
Anything less is trying to find a cheap and ineffective shortcut in this trust model.
It's not the only thing, but it is something.
There's a lot of social engineering that went into the xz backdoor[0]. This started years ago; Jia Tan was posting in projects and suddenly someone appeared to pressure projects to accept their code. Who's Jia Tan? Who's Jigar Kumar, the person who is pressuring others to accept patches from Jia Tan? We don't know. Probably some person or group sponsored by a state APT, but we don't know for sure, because they're currently just text on a screen.
Having this person or group of people have to continually commit to the bit of publicly-known open-source maintainer who attends conferences, has an actual face, and is on security camera footage at multiple hotels and airports is far, far harder than just talking a vulnerable person into allowing maintainer access on a repository. Making them show up to different places a few times adds a layer of identity. Otherwise these "skilled eyes" could be anyone with a wide variety of motivations.
[0]https://boehs.org/node/everything-i-know-about-the-xz-backdo...
This is assuming maintainers even care/want to go.
The same footage that'll get wiped a few weeks after the conference ends, and quickly becomes not useful.
This is wonderful posturing in the name of security theater but doesn't solve anything.
If they don't want to go, don't use their project. Sorry, these aren't the TI-83 games you passed around at your high school with programming cables; they're the code libraries our society is built on. If my project relies on your project, I need to know who you are. If I can't figure that out, I'll try to find another one.
This is wonderful posturing in the name of security theater but doesn't solve anything.
Along with receipts, eyewitnesses, plane tickets, etc. that put a person at a place at a time. Doesn't all have to be digital evidence.
You have a good point, but there's also a reason why companies like people to come into work and don't hire remotely as much as they should (or could). There's a reason why interviews often include a meal together. Meeting people IRL is good for building trust, on both sides.
This is almost impossible for remote OSS maintainers. Do you want people to upload passports? And what if a three agency can easily produce whatever material you want?
Sounds like it's time for someone to either pay a few visits to the remote maintainer or give them a scholarship for attending a few conferences.
The big companies can do that. But then there is also the question of -- how many of these critical OS libraries are there in the wilderness?
I feel a census coming on.
There needs to be a reckoning of who is doing what where on this sort of thing. After this whole fiasco you'll probably see more contracts wanting to know who's working on these things, and that will, in turn, have people auditing their software's packages.
The reality of such criteria is that it will be a ladder-pull for any new entrants.
Not a workable option, full stop.
This is really not the responsibility of unpaid developers.
Big vendors should pay to get to know them, because they're the ones making the money off of the developers' work, but "I don't want to meet anybody and want to just manage the project" is the FOSS version of "just trust me bro".
That's not what the above commenter said. This may be your interpretation but the above commenter is essentially saying "don't work with Chinese-sounding developers" and is the completely wrong take here. Jia Tan may or may not be Chinese but the core issue is the lack of basic vetting to make sure he/she/they are a real person.
I don't think that #8 implies that projects should stop accepting contributions from Asian-sounding names. To me it means that people should be more careful who they give access. It doesn't matter if it was China or some other state or organization pretended to be China, the problem is that people don't expect that open source contributor wouldn't act in altruistic way, but can be a malicious entity.
And to build on your point (hopefully), one way of understanding #8 is that it's not out of the question that bad actors have the time resource and patience to coordinate long-term campaigns of significant subtlety, the type of which is more easily pulled off by a state actor. Facts such as those should inform our presumptions about when and where people enjoy the benefit of the doubt.
For example, we hope that Linus is not a long-term agent of the Suojelupoliisi - but how would you prove it?
Ideally, the "proof is in the code" and the review setup is strong enough that it could handle a Compromised Linus™, even if it couldn't handle multiple compromises.
I mean I would hope that there's a way to separate out the Linuses from the Jia Tans. But it's no longer out of the question that a campaign can build up an account or accounts with long-term histories of good standing that really challenge our intuitions.
But I suppose you are right, the best backstop is for the proof to be in the code.
This is so obvious that needing to say it shows how prejudice can blind people.
This is why things like kyc exist in other contexts
The problem with any social test is that it’s biased by default towards whomever is controlling access
I would personally add: test your build scripts.
There are so many bugs like the "added dot to disable landlock" added as part of this action (which can also be typos [0]), not to mention that relying on some tools in autoconf to set feature flags will just disable them if those tools are not present [1].
[0] https://twitter.com/disconnect3d_pl/status/17744965092596453...
[1] https://twitter.com/disconnect3d_pl/status/17747470223623252...
I don’t understand how this is still the best way to test if features are available in C. Can’t the OS / environment provide a “features_available” JSON blob listing all the features on the host system? Is AVX2 available on the cpu? OpenSSL? (And if so, where?) and what about kernel features like io_uring?
Doing haphazard feature detection by test compiling random hand written C programs in a giant sometimes autogenerated configure script is an icon of everything wrong with Unix.
This hack shows that the haphazard mess of configure isn’t just ugly. It’s also a pathway for malicious people to sneak backdoors into our projects and our computers. It’s time to move on.
Satire is so hard on the Internet and maybe I'm just thick headed. Just to clarify things, is that a suggestion to shove a JSON parser into either bash or autoconf?
JSON is too far, says the engineering culture still relying on a pile of shell scripts like it’s 1970.
(that’s unfair, there’s probably tooling to build the shell scripts automatically I bet)
You have it backwards. The engineering culture that said autoconf increases attack surface, the culture that said that the design of PAM is too complex, that did not accept the patch to link libsystemd to sshd, is the one that constantly tries to avoid unnecessary dependencies.
The engineering culture that non-ironically suggests linking a JSON parser is the culture that disregards the challenges that maintaining dependencies brings.
Yesterday it may have been soups of automatically generated shell scripts, but today it is soups of automatically generated YAML and JSON.
... The engineering culture which gave us 200kb fragile, semi-autogenerated configure scripts checked in to our repositories. Configure scripts which - as we've just seen - are a great place to hide malicious code.
I can't take this criticism seriously. 200kb of configure script = good, 1000 lines of JSON parser in bash = bad? What?
https://xkcd.com/927/
"How to demotivate people and prevent innovation!"
I think that would land harder if configure / automake / autoconf were actually a standard. And not, you know, a bunch of cobbled together shell scripts that generate other shell scripts.
A few things here.
First, yes, there's several tools which provide (incomplete) feature selection functionality, you can see some sibling comments for examples.
Second, especially in complex projects, the presence of a feature doesn't necessarily mean it's sufficiently complete to be workable. You can run into issues like, say, "I need io_uring, but I need an io_uring op added in version X.Y and so it's not sufficient to say 'do I support io_uring.'" Or you can run into issues like "this feature exists, but it doesn't work in all cases, particularly the ones I want to use it for."
Third, there's no real alternative to feature detection. In practice, build systems need to cope with systems that pretend to be other systems via incompletely-implemented compatibility layers. Version detection ends up creating the User-Agent problem, where every web browser pretends to be somebody pretending to be somebody pretending to be Netscape 5.x and if you try to fix this, the web breaks. (Not to mention the difficulty of sniffing versions correctly; famously, MS skipped Windows 9 reportedly because too many build systems interpreted that to mean Windows 95 or Windows 98 with catastrophic results).
The end result of all of this is that the most robust and reliable way to do feature detection is to try to use the feature and see if it works.
That sounds fine though. If the system claims to provide feature X, you probably want the program in question to compile assuming feature X is available. If the compatibility layer doesn’t work as advertised, a compiler error is a great choice. Let the user choose to turn off that flag in their system configuration when building the project.
I’m not proposing user agent sniffing. I’m proposing something much more fine grained than that. Make something that looks more like the output of configure that build systems can use as input.
There are some utilities like pkg-config [1] and /proc/cpuinfo [2] that try to provide useful configuration information in distribution agnostic ways.
[1] https://en.wikipedia.org/wiki/Pkg-config
[2] https://www.baeldung.com/linux/proc-cpuinfo-flags
True, but it works quite well which is why it is widely used. If you need to ensure that your C code will implement a desired feature, testing it with a small program before building makes a lot of sense. With different operating systems running various C compilers that all work slightly differently, it is a proven approach that achieves the needed outcome, however ugly it might be.
A similar thing could easily be done with other approaches: a typo of a feature name would have the same effect. The main issue is autoconf is such a mess of layered bad scripting languages it's impossible to really get a good idea of what is actually going on.
In general I think feature detection is probably not necessary for most cases (especially the cases that a huge number of autoconf scripts do: the number of linux-only projects which have feature detection for a feature which is certainly present is ridiculous). It's much more reasonable to just try to build with all features, and provide manual flags to disable unwanted or unavailable ones. These at least mean the user/maintainer can decide more explicitly if that feature should be present or not.
That wouldn't be consistent with the "unix philosophy".
They never did. In fact the systemd maintainers are confused on that point and adding documentation on how to implement the simple datagram without libsystemd.
Way more than tens of millions. Python, php, ruby and many other languages depend on libxml2, libxml2 uses liblzma. And there's many other dependencies.
Not any maintainer's job. OSS is provided without warranty. Also indication is "Jia Tan" may have been completely fake as their commit timestamps show even on the same day that their timezone switches from Eastern Europe to Asia. So at the very least, they were playing identity games.
I said this days ago, but re timezones - they are meaningless as even GCHQ and NSA etc will place false flags in code which has any kind of risk of exposure. I first learned about those techniques from all the high profile intelligence agency leaks from the USA who were performing those themselves.
It's not that simple. You can falsify a lot of things, but you can't easily falsify the working hours at which you reply to issues or push commits without taking a lot of care. Especially when DST has to be considered.
Sure, the +0800 timestamps are definitely fake. A handful of timestamps that were later scrubbed show +0200 and +0300, though. And all the commits match 9am to 6pm working hours if you interpret them as +0200/+0300. The working hours even shift around correctly with the DST change.
The issue is that russia doesn't observe DST anymore. That leaves Bulgaria, Cyprus, Estonia, Finland, Greece, Israel, Latvia, Lebanon, Lithuania, Moldova, Romania and Ukraine. Very few of those have the infosec capabilities needed for something like this.
Jia Tan was registered in 2021, but first sprung into action during the buildup to the russian invasion of Ukraine 2022.
Jia Tan also used a VPN provider that's headquartered in the US. That only makes sense if they're in a US-aligned country, as using a US VPN would give the US more insight into what you're doing, and only protect you from other countries.
Personally, I'd guess that it was Israeli intelligence. But Finland, where the original XZ author lives, is another interesting possibility.
Re the falsifying working hours, wouldn’t these boffins be able to automate Git commits at certain times or even pass instructions to another team who is working the late night shift to post these changes etc.
I went to MacDonald’s last night, it is open 24/7, these spy agencies surely aren’t more lazy than minimum wage MacD employees - I am sure they work around the clock. Plus, you have night hawks like me who get more stuck in to a project at 4am and sleep through the day.
Israeli intelligence, ah probably. Wouldn’t be surprised. I imagine if it was GCHQ, it wouldn’t have been so noisy and got uncovered like this.
Is it possible? Definitely. But that's extremely rare, especially if you want to keep a relatively natural pattern for the commits and replies.
You'd basically have to have a team of devs working at really odd times and a queuing system that automatically queues all emails, github interactions, commits, etc to dispatch them at correctly distributed timestamps.
And you'd need a source pattern to base your distribution on, which is hard to correctly model as well.
e.g., if someone slept badly one night, the next morning their interactions shift slightly back and are more sparse in the morning. Their lunch break will also shift due to that. Such changes usually are most prominent in the days surrounding DST changes.
What sort of nonsense is this? Have you ever actually known any software developers? A huge number of them keep odd hours, moreso in the infosec sphere. They wouldn't need to automate anything, just start working hours that match the timezone that they're faking... If it really is a state actor, I imagine they'd be able to find someone willing to keep those hours.
I agree. I don’t follow the time zone of the country I live in as I work remote. Haven’t seen the sun in a while. Probably not healthy.
I've spent a lot of time analysing activity patterns, it's not as simple as you think, especially once you combine IRC, GitHub interactions and commits themselves.
These are my own hours for example (from a few years ago): https://i.k8r.eu/lgN3ug.png
I would have to agree with kushku, it’s a very very high bar to fake thousand of timestamps over several years in a consistent way that doesn’t attract suspicion under forensic scrutiny.
Of course this is post facto so not that helpful until after something serious happens.
If there was some sort of reputation system that could do this analysis automatically then that would be very useful.
You don't need to automate git commits to fake the timestamps; just change system clock to the desired time, make the commit and then reset the clock to local time when you're done. It all happens on the local machine so the timestamps of commits should be considered completely untrusted information.
That's exactly what Jia Tan did, but this failed a few times during rebases as well as with commits done from the web UI.
Additionally timestamps from comments on GitHub itself are trusted information and match the UTC+2/UTC+3 data well.
To clarify (1), we should not be exchanging tarballs, period. Regardless of whether it's different from what's in the source repository.
It's 2024, not 1994. If something masquerading as open-source software is not committed to, and built from, a publicly verifiable version-controlled repository, it might as well not exist.
So all of Fabrice Bellard's projects (https://bellard.org/) should not exist?
if these projects have any value, someone will clone them to a public repository.
While introducing minor changes to the build system scripts, you mean?
i'll take any publicly visible build system over a hidden one
Nothing here was ever hidden.
all of Fabrice Bellard's projects are primarily available as tarballs from their website.
definitely more hidden than a public git repo. and allows for xz style backdoors without being able to investigate them after the fact.
I don't see what point you're trying to make. For me a tarball is at least as public as a "public git repo" (whatever that means). In fact I would argue a git repository allows for way more opportunities for obfuscation, seeing that it is a much more complex format.
Most of his well-known projects have public git repositories that you can clone, inspect, and contribute to.
Or… just have downstream users run autotools as part of the build?
See point 3:
Whether it's autotools or not is not very relevant to point 1a. I'm also confused why point 1 leads to 1a, and not to the opposite of 1a.
Source distribution tarballs should not contain code different from what's in the source repository. They should not contain automatically generated artifacts, since those should not be in the repository, since they are by definition not the source, but output of some kind of build process.
Having the automatically generated configure script in the repository would have made it slightly easier to spot the backdoor if anyone took the time to read the committed configure script, but if it's already in the repository most people will just take that for granted, not run whatever process generates it, and not notice that it's not actually the output of said process.
I think the point is that all of the code which will get compiled to produce the final binary should be in the repo, and so any generated code that affects the final binary should be in the repo.
The use of autotools or other similar tools, ones that are supposed to generate code on the fly on the final user's machine, make this requirement essentially impossible.
Your compiler also falls under that bucket, though.
Reference the old Ken Thompson compiler hack to provide a backdoor for Unix logins: https://wiki.c2.com/?TheKenThompsonHack
In this case, we're talking about someone distributing source code for others to compile, so that is out of scope.
That very last point seems like something that's fairly amenable to automation, though. (Then, that automation can be attacked, but that seems like one more [fairly independent] layer that must be bypassed to execute one of these attacks.)
Part of the bad in autotools culture is to run it when creating the release so people don't need to run before building.
Letting people run autotools would completely avoid this one hack.
But well, you have a point in that most of what makes autotools bad is that you can't expect your userbase to learn how to use it.
Why don't object files and binaries count as autogenerated artifacts? Should we commit those to the repo too? Where is the line between an artifact that should be committed, and one that shouldn't be?
libc will dynamically load libnss-* on a lot of platforms, some of which can link to a bunch of other helper libraries. What if the attack had come via one of those 2-or-3-dependencies-removed libraries? libc is big and complicated and most programs only use a tiny fraction of it. Is libc a problem for the ecosystem?
Absolutely yes. And also the size of the kernel.
Those two currently have a much better guaranteed quality than systemd, thus systemd is a much more pressing issue. But they don't stop being a problem just because they are not the largest one.
What are you basing that on?
Historical behavior.
There doesn't exist anything else this could be based on.
Does systemd have a historical record of its defect rate being significantly greater than that of the kernel/glibc?
I just checked Coverity scans, and the most recent defect densities appear to be:
which actually looks pretty good for systemd. Is there some other analysis you're basing this off? Or are the current rates atypical, and systemd used to be a lot worse?https://scan.coverity.com/projects/linux
https://scan.coverity.com/projects/gnu-c-library-glibc
https://scan.coverity.com/projects/systemd
Is mostly in the hardware support, only a tiny fraction of which is actually active. Linux has a lot of drivers, many of them are crap, but it's not obvious to me that Linux would be better off with no driver than a crap driver.
That tiny fraction is quite huge. Filesystems and networking support are well known problematic areas, the sound system is a chapter by itself, and Linux is full of old, should-be-unused interfaces that attackers successfully use once in a while.
Besides, the core part of the kernel is way too big for anybody to read. And any of it can interact with any other part.
IMO yes. I definitely believe having basic common functionality (malloc, printf, memcpy etc.) provided by one library with all the crazy/obscure stuff that very few people need or want somewhere else would be an improvement.
I'd say anything that is input to the compiler should be.
Yes, the libnss stuff is also a problem.
I couldn't agree more. Coming from the BSD world, systemd is a shock to the system; it's monstrous and has tendrils everywhere.
If you actually come from BSD, you'd hopefully recognize a set of different utilities combined to form a holistic system released under a single name. It's not a new idea.
Besides, the gpp is incorrect: systemd dependencies are not needed for initialisation notifications.
I think there's a low-effort solution to GP: Just split off the notification function for now.
There's a dilemma here: Make a huge number of tiny libraries and people complain about left-pad. Make a monolith and this type of attack can happen. If left-pad is more preventable, let's go that way. The fact that C and C++ have tons of overhead in producing a package is their problem to deal with through better tooling.
Making a number of similar libraries that would be better served as some sort of common set (i.e. even at the most basic level, right pad and left pad can be in one thing, RIGHT?)... but at the same time it's a particularly bad example because the overall behavior of that tread was a form of influencer growth hacking.
that said, I think something like a 'notification function' falls into the category of 'boundary API' and those should always be segregated where possible for security as well as maintenance purposes for all parties.
100% agree that some of the functionality could be decoupled, and either the project should provide independent helper libs or at least do a better job of documenting the interfaces.
In this specific case, the notification interface is documented (and there's client implementations in a bunch of languages).
I read recently that systemd does not recommend that you link to libsystemd to participate in systemd-notify message passing. The API for talking to it is quite simple, and vendors are encouraged to implement a compliant interface rather than loading all of libsystemd into your program to manage this. This of course would mean maintaining your own compliant interface as API changes happen, which is likely why it isn't done more frequently. It seems to me that there would be a lot of value in systemd stubbing out libraries for the various functions so that dependent projects could link to the specific parts of systemd it needs. That, or some other way to configure what code gets load when linking libsystemd. Full disclosure, I've not looked at libsystemd to see if this is already possible or if there are other recommendations by the project.
libsystemd is too juicy of a target, especially with the code reuse that does not appear to take into account these attack vectors.
Perhaps any reuse of libraries in sensitive areas like libsystemd should require a separate copy and more rigorous review? This would allow things like libxv to be 'reused', but the 'safe' versions would require a separate codebase that gets audited updates from the mainline.
libsystemd had already removed the lzma dependencies last week, before the backdoor became public, to reduce external dependencies.
No they did not. The dependency is still there, it's just being lazy loaded.
This would be prevented this particular exploit, which would have needed to take another approach, but at the price of making dependencies invisible and hard to debug. You could no longer have found vulnerable systems by way of ldd.
The only solution for the attack surface of systemd is to make the individual components more loosely coupled. There is no reason the same library is responsible for readiness reporting and reading logs.
One could even argue that none of those functions have anything to do with the job of init. Readiness can break in a number of ways, robustness is built on health checks.
Even if it hadn't been loaded by libsystemd, liblzma is also loaded by SELinux, which would have allowed the same vulnerability via a different vector.
Personally I think projects like fedora silverblue/kinoite and other container-based OSes are going in the right direction. We need a base OS that's as small as possible so it can be audited, and everything else then needs to live in a container so it doesn't have to be audited but is still secured properly.
Well, not the same vulnerability, as libselinux isn't loaded by sshd. Any such request would probably have a low probability of acceptance among the openssh maintainers.
If anything, I think this shows that real world security is hard and must happen at every level. This library is likely to be included in any base OS no matter how small, and rebuilding the container world just to patch is inefficient.
This attack may have been found by luck alone, even if that luck involved having talented developers on our side, but it really showed how well the open source community responds to such attacks. Within a day of it being public, we had well mapped out what the problem was and how to best respond to it. A day that was also a holiday in large parts of the world.
More personal observations:
8. Consumers are naive, yes. But the software industry itself is naive about the security threat.
9. The social exploit is part of the code exploit.
10. The FOSS axiom "More Eyes On The Code" works, but only if the "eyes" are educated. FOSS needs material support from industry. A MSFT engineer caught this exploit, but it still was released to G.A. in Fedora 41, openSUSE, and Kali.
11. The dev toolchain and testing process were never conceived to test for security. (edit: Also see Solarwinds [1] )
= = =
[1] _ https://www.wired.com/story/the-untold-story-of-solarwinds-t...
One thing that could help with this is if somebody points an LLM at all these foundational repositories, prompted with "does this code change introduce any security issues?".
I found the black hat!
Not sure why an LLM would be better than existing static analysis tools. Many projects I have worked on run static vulnerability analysis on PRs.
That's the whole problem right there: lack of eyes on the code. If this code was actually maintained by more than one person, there's a high chance one of them would have caught on to it.
I philosophically and fundamentally hate this suggestion, but have to agree with it. It's going to make porting harder, but is sadly a cost worth paying.
Tough call. A major library is more likely to be bug fixed and tuned than something you write (which is a good reason to use them which is what makes them attractive as an attack vector). Getting this right requires taste and experience. The comment says "depending on large libraries for small amounts of functionality [is bad but thought to be good]". What constitutes "small amount" vs large requires experience. Certainly cases of this tip my bias towards re-implement vs re-use.
Why would it make porting harder ?
Often the autogenerated things are dependent on the machine they are running on.
So a big reproducible build issue.
Well it really sucks if you build a system but the autogenerated file is for a different CPU.
I would more just say autogenerated artifacts should just be autogenerated by the build. Committing them doesn't really solve the problem. This is pretty much just a historical hangover in autotools where it targeted building on platforms where autotools wasn't installed, but it's not really a particularly relevant use-case anymore. (I do agree in general that autotools is bad. Especially on projects where a simple makefile is almost always sufficient and much more debuggable if autotools fails).
I don't think libsystemd is a particular problem. Or at least it being linked in only made the job of writing the exploit slightly easier: there's enough services running as root that will pull in a dependency like this that the compromise still exists, it just requires a few more hoops to jump through. And systemd has in fact deliberately made the notification process simple specifically so people can avoid the dependency (if not for security, then simply for ease of building in a way which supports systemd notification but doesn't need anything else).
Dependencies are a liability, for sure, but I think a lot of the reaction there is not entirely helpful. At least, the size of the dependency tree in a package manager is only about as good a proxy for the risk as number of lines of code is for software project progress. Dependencies need to be considered, but not just minimised out of hand. There are plenty of risks on the reimplement-it-yourself side. The main thing to consider is how many people and who you are depending on, and who's keeping an eye on them. The latter part is something which is really lacking: the most obvious thing about these OSS vulnerabilites is that basically no-one is really auditing code at all, and if people are, they are not sharing the results. It should in principle be possible to apply the advantages of open-source to that as well, but it's real hard to set up the incentives to do it (anyone starting needs to do a lot to make it worthwhile).
There are practical and philosophical problems with this. From the practical point of view you generally want to make contributing (or even just building) your stuff as low friction as possible and having extra manual build steps (install tools X at version X1.X2.X3, Y at version Y1.Y2 and Z at version Z1.Z2rc2) isn't low friction.
Philosophically, you are just shifting the attack vector around, you now need to compromise one of tools X, Y and Z, which are probably less under your control than the artifacts they produce.
People say this but I'm skeptical, this is the actual documentation of the protocol:
"These functions send a single datagram with the state string as payload to the socket referenced in the $NOTIFY_SOCKET environment variable. If the first character of $NOTIFY_SOCKET is "/" or "@", the string is understood as an AF_UNIX or Linux abstract namespace socket (respectively), and in both cases the datagram is accompanied by the process credentials of the sending service, using SCM_CREDENTIALS. If the string starts with "vsock:" then the string is understood as an AF_VSOCK address, which is useful for hypervisors/VMMs or other processes on the host to receive a notification when a virtual machine has finished booting. Note that in case the hypervisor does not support SOCK_DGRAM over AF_VSOCK, SOCK_SEQPACKET will be used instead. The address should be in the form: "vsock:CID:PORT". Note that unlike other uses of vsock, the CID is mandatory and cannot be "VMADDR_CID_ANY". Note that PID1 will send the VSOCK packets from a privileged port (i.e.: lower than 1024), as an attempt to address concerns that unprivileged processes in the guest might try to send malicious notifications to the host, driving it to make destructive decisions based on them."
So technically you have to support unix domain sockets, abstract namespace sockets, whatever SCM_CREDENTIALS is, whatever AF_VSOCK is and the SOCK_SEQPACKET note is completely obscure to me.
The fact that the protocol has changed over time is itself a problem.
It isn't reasonable to expect everyone maintaining a daemon to keep track of additions.
Not agreeing or disagreeing, just curious how you draw conclusion (4)? This attack has nothing to do with systemd or its ecosystem etc.
This attack used the fact that several distros patch OpenSSH to link to libsystemd for notifications. Libsystemd links liblzma, and the backdoor checks if it's been linked into OpenSSH's sshd process to run. Without distro maintainers linking libsystemd, xz wouldn't have been a useful target for attacking OpenSSH.
Re: 1. People keep saying this. We should stop distributing tarballs. It's an argument that completely ignores why we have release artifacts in the first place. A release artifact contains more than just autoconf scripts.
There can be many reasons to include binary blobs in a release archive. Game resources, firmware images, test cases. There was today a comment that mpv includes parts of media files generated with proprietary encoders as test cases. That's good, not bad.
The well maintained library sqlite is everywhere, and has an excellent test suite. They release not one but two tarballs with every release, for different stages of compilation. It would be trivial to stop doing this, but it would make maintaining packages more work, which does nothing to improve security.
The reason Debian builds from curated tarballs are because they are curated by a human, and signed with a well known key. They could certainly build from git instead. But would that improve the situation? Not all projects sign their release tags. And for those that do, it is more likely to be automated. We the collective want changes to be vetted first by the upstream maintainer, then by the package maintainer, and would prefer these entities to be unrelated.
This time the process was successfully attacked by a corrupt upstream maintainer, but that does not mean we should do away with upstream maintainers. Several backdoor attempts have been stopped over the years by this arrangement and that process is not something we should throw away without careful consideration.
The same improvements we have been talking about for years must continue: We should strive for more reproducible builds. We should strive for lower attack surface and decrease build complexity when possible. We should trust our maintainers, but verify their work.
There are a lot of bad to terrible takes here, ranging from hindsight 20/20 to borderline discriminatory:
3. The issue here has more to do with the generated tarball doesn't match source. You (i.e. distro owners) should be able to generate the tarball locally and compare with the generated artifact and compare. Autotools is just a scapegoat.
4. xz is used in a lot of places. Reducing dependencies is good, but trying to somehow say this is all systemd's fault, for depending on liblzma is not understanding the core issue here. The attacker could have found another dependency to social engineer into, or find a way to add dependencies and whatnot. It's very easy to say all these stuff in hindsight.
5. Again, I agree with you on principle that dependencies and complexity is a big issue and I always roll my eyes when people bring in 100's of dependencies, but xz is a pretty reputable project. I really really doubt someone would have raised an issue with adding liblzma or think that the build script would introduce a vulnerability like that. Again, a lot of hindsight talking here, instead of actually looking forward to how something like this could realistically be prevented. Too many dependencies are but it's not suddenly everyone will write their own compression libs.
6. Again, I mean, I don't disagree with you on principle but that is not the lesson from this particular incident. This may be your pet peeve but it wasn't like the integration with libsystemd would have raised anyone's alarm.
8. This is just a thinly veiled way of saying "don't work with anyone of Chinese descent". I don't want to use the R word but you know exactly what I mean. There's no evidence Jia Tan is Chinese anyway, or that this is done by China. We simply don't know right now, and as far as we know they could have used any western sounding name. The core issue here is that the trust was misplaced, and the overworked maintainer didn't try to make sure the other person is a real one (e.g. basic Googling). So what, if you don't work with any Chinese, if someone is called "Ryan Gosling" you automatically trust them?
---
I do agree with point 7.
The other problem is that C’s engineering culture is termites all the way down.
A test resource getting linked into a final build is, itself, a problem - the tooling should absolutely make this difficult, and transparent/obvious when it happens.
But that’s difficult because C never shed the “pile of bash scripts” approach to build engineering… and fundamentally it’s an uphill battle to engineering a reliable system out of a pile of bash scripts.
The oft-discussed problems with undefined behavior, obscure memory/aliasing rules, etc are just the obvious smoke. C is termites all the way down and really shouldn’t be used anymore, it’s just also Too Big To Fail. Like if the world’s most critical infrastructure had been built in PHP.
Well, so no news?
But seriously, yes, I think I've seen people dismissing each one of those points. And now we have concrete proof they are real. The fact that somehow an scandal like this didn't happen before due to #1, 2, or 3 is almost incredible... on the meaning that a viable explanation is that somebody is suppressing knowledge somewhere.
Point 8 simply isn't going to happen. And that means that if you want secure OSS, you must pay somebody to look around and verify those things. And the problem with that is this means you are now into the software vendor political dump - anybody that gets big doing that is instantaneously untrustworthy.
Overall, my point is that we need some actual democratic governance on software. Because it's political by nature, and pushing for anarchy works just as well as with any other political body.
No system is safe from bad actors.
The only way to armor yourself is to have consistent policies. Would this have happened if there were code reviews and testing?
Consistency is key. At my workplace we routinely bypass branch protections, but we're only responsible for a few customers.
This is ridiculous, nobody "encourages" every service to depend on it for initialization notifications, you can implement the logic in 10 lines of code or less.
9. We should move toward formal verification for the trusted core of systems (compilers, kernel, drivers, networking, systemd/rc, and access control).
With regard to 1, there are some other practical steps to take. Use deterministic builds and isolate the compilation and linking steps from testing. Every build should emit the hashes of the artifacts it produces and the build system should durably sign them along with the checksum of the git commit it was built from. If there need to be more transformations of the artifacts (packaging, etc.) it should happen as a separate deterministic build. Tests should run on a different machine than the one producing the signed build artifacts. Dropping privileges with SECCOMP for tests might be enough but it's also unlikely to be practical for existing tests that expect a normal environment.
I think your eight point is regrettable but mostly true - I’d soften it to professional relationships, which kind of sucks for anyone trying to get started in the field who doesn’t get a job with someone established, and adds an interesting wrinkle to the RTO discussion since you might “work” with someone for years without necessarily knowing anything about them.
It also seems like we need some careful cultural management around trust: enshrine trust-but-verify pervasively to avoid focusing only on, say, Chinese H1-Bs or recent immigrants (whoops, spent all of your time on them and it turns out you missed the Mossad and Bulgarian hackers) and really doubling down on tamper-evidence, which also has the pleasant property of reducing the degree to which targeting OSS developers makes sense.
Combining your 7th point with that one, I’ve been wondering whether you could expand what happened with OpenSSL to have some kind of general OSS infrastructure program where everyone would pay to support a team which prioritizes supporting non-marquee projects and especially stuff like modernizing tool chains, auditing, sandboxing, etc. so basically any maintainer of something in the top n dependencies would have a trusted group to ask for help and be able to know that everyone on that team has gone through background checks, etc.
I'd add a 9: performance differences can indicate code differences. Without the 0.5s startup delay being noticed the backdoor wouldn't have been found. It would be much easier to backdoor low-performance software that takes several seconds to start than something that starts nearly instantly.