return to table of content

Xz: Can you spot the single character that disabled Linux landlock?

phoe-krk
90 replies
1d4h

So that function checked if the following C code compiled, and only in that situation enabled the landlock?

Except that lone period, hard to recognize because of its small size and proximity to the left edge of the diff, caused the C code to become always invalid, hence keeping the landlock always disabled?

That's both vilely impressive and impressively vile. I didn't even spot it on my first read-through.

082349872349872
46 replies
1d4h

I just got a little more respect for pythonic whitespace-sensitivity

EDIT: come to think of it, even that might not have done much here, where well-formedness is the issue :(

capitainenemo
24 replies
1d3h

Yeah, if anything, python worsens the situation. I had a friend DOS our server because he accidentally inserted a tab, causing the illusion that one statement was inside a block but was actually outside it. He swore off python at that point. I personally avoid the language, but I understand due to issues like that these days mixing tabs and spaces is an error (or is it just a warning?) by default. Regardless, still pretty silly to me to have whitespace play such a major significant role, besides the fact that I find it visually harder to read, like text without punctuation or capitalisation.

katbyte
11 replies
19h23m

white space as a delimiter is why i never use python.

naikrovek
10 replies
19h16m

white space as a delimiter is why i never use python.

Whitespace is a delimiter in (almost?) all languages humans use.

Whitespace determining which scope you’re in is one of the many problems of making whitespace significant, which might be what you meant.

ratmice
5 replies
18h21m

Regarding your almost query. There was a debate over Ogham space mark in unicode. It is considered whitespace though, with the rationale that it is sometimes visible, but sometimes invisible. Depending upon whether the text has a stem-line. That doesn't make the set of non-whitespace delimited languages empty. Perhaps there is one with an always-visible delimiter that didn't get the whitespace justification, but does at least give one human language delimited by a printable character (which happens to be whitespace).

eru
4 replies
16h6m

That doesn't make the set of non-whitespace delimited languages empty.

Well, there's also the opposite: Goldman Sachs's Slang allows space as part of identifiers.

skissane
3 replies
8h5m

Well, there's also the opposite: Goldman Sachs's Slang allows space as part of identifiers.

Many SQL implementations permit whitespace in identifiers, but then you need to use quoted identifiers.

eru
2 replies
8h2m

That's reasonably sane of SQL. In Slang, you don't need to quote. (The syntax is still unambiguous. In principle, eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between. But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between.)

skissane
1 replies
7h39m

In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers

eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between

The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity

But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between

This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`

eru
0 replies
2h57m

In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers

Sorry, I don't remember, and I can't seem to find it out online. I could ask my friends who still work there, if it's important. (For what it's worth, I never remember anyone complaining about mixing up a different number of spaces in their variable names. So either all number of spaces were treated the same, or perhaps multiple spaces in a row in an identifier were just banned (via linter or via language?).)

The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity

Yes, you would need to sort out these details, if you wanted to add this 'feature' to Python.

This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`

I'm glad Rust made the same decision.

I do like using the 'space operator' to denote functions calls, at least for a language like Haskell or OCaml.

tomoyoirl
2 replies
18h52m

“The thing that controls scope is the count of certain nonprinting characters, which happen to come in multiple widths” is reasonably insane, yes

eru
1 replies
16h8m

Which non-printing characters are you talking about? Whitespace characters are very much printable.

Yes, I agree that Python should just forbid tabs. As a second best, you can tell your linter (and/or formatter) to forbid tabs (and any weird unicode whitespace). That's basically the equivalent of compiling your C with -Wall.

takeda
0 replies
11h39m

This was an issue in Python 2, where for some dumb reason it allowed mixing tabs and spaces and equated tab to 8 spaces (I think). Python 3 doesn't have that issue.

WirelessGigabit
0 replies
15h13m

Human language is also way more ambiguous. One of the reasons I love coding is a massively reduced vocabulary and a way more strict grammer.

secstate
9 replies
1d2h

Mixing tabs and spaces usually throws a runtime exception. I'm not gonna make a value judgement about that, but your story doesn't make sense based on how I understand py3

Edit, sorry, shoulda read your whole commebt before replying

capitainenemo
8 replies
1d2h

Yep. It was a few years ago while that was stilled allowed (as I'd noted ;) ) but regardless. Significant whitespace is just annoying.. There's a lot of things that render as whitespace, and source code one might be reviewing could be printed wrapped or copied and pasted in odd ways. Other languages are more robust to this.

_visgean
5 replies
18h59m

Never had this happen, this is largely eliminated by using tools like black or other autoformatters.

MrDresden
2 replies
10h31m

The point still stands that this is an inherit possibility in the language it self.

eru
0 replies
10h9m

Well, in C the visual indentation and the meaning of the code (given by {}) can diverge. That's even worse, and happens in practice. Many style guidelines for C have evolved specifically to address that problem with 'the language itself'.

_visgean
0 replies
7h42m

I think its more of an tooling issue. If your editor / diff viewer cant differentiate between different whitespaces get a better one. Also if you want to ensure some function isnt in local scope etc just test it.

probably_wrong
0 replies
6h9m

I feel that the main point still stands, though. Saying that Python doesn't have a whitespace problem because you can send the code through a tool that detects whitespace-related problems still acknowledges the existence of said problem.

eru
0 replies
16h6m

And/or linters.

This issue is no worse that having multiple different variables that look the same and only in their obscure same-looking unicode symbols.

A linter can handle this. As can a decent font, or editor settings.

heavenlyblue
0 replies
16h10m

You can enable showing whitespace characters in your editor, in PyCharm they are visualised perfectly well as to not distract from the non-whitespace.

deanishe
0 replies
3h54m

There's a lot of things that render as whitespace

Like what? As you note, mixing tabs and spaces is now an error.

I've never understood the objection to semantic whitespace. Complaining about having to indent code correctly strikes me as being akin to complaining about having to pee in the toilet instead of the bathtub.

raverbashing
0 replies
18h51m

Yes mixing tabs and spaces is a big no no and rightfully throws an error now

fragmede
0 replies
15h19m

humans are weird creatures sometimes. there was this bad thing that happened that won't happen again now, but now I can't use the thing forever because Reasons.

philsnow
15 replies
19h0m

AST diffs instead of textual diffs might have helped here (to spot the `.` making the code un-compilable).

Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.

raverbashing
8 replies
18h53m

Ah yes but thanks to C being cursed due to includes and macros this is harder to do

uecker
7 replies
18h47m

Huh, the code with a dot is not legal C. It is CMake issue that the test breaks here.

acdha
3 replies
17h17m

That’s what makes this so clever: these systems were born in the era where you couldn’t trust anything - compilers sometimes emitted buggy code, operating systems would claim to be Unix but had weird inconsistencies on everything from system calls to command line tool arguments, etc. - so they just try to compile a test script and if it fails assume that the feature isn’t supported. This is extremely easy to miss since the tool is working as it’s designed and since it’s running tons of stuff there’s a ton of noise to sort through to realize that something which failed was supposed to have worked.

josefx
2 replies
9h33m

You could just run tests for the feature detection on a known system or a dozen(VMs are cheap). The big problem is that most code is not tested at all or test errors are flat out ignored.

acdha
1 replies
5h42m

Nothing “just” about it. VMs weren’t cheap when the C world started using autoconf - that was a feature mostly known on IBM mainframes – and in any case that couldn’t do what you need. The goal is not to figure out what’s there on systems _you_ control, it’s to figure out what is actually available on the systems other users are building on. Think about just this library, for example, your dozen VMs wouldn’t even be enough to cover modern Linux much less other platforms but even if it was (drop most architectures and only support the latest stable versions for the sake of argument), you’d still need to build some reliable prober for a particular bug or odd configuration.

The problem here wasn’t the concept of feature detection but that it was sabotaged by someone trusted and easily missed in the noise of other work. What was needed was someone very carefully reviewing that commit or build output to notice that the lockdown feature was being disabled on systems where it was fully enabled, and we already know that maintainer time was in short supply. Any other approach would likely have failed because the attacker would have used the same plausible sounding language to explain why it needed to be a complicated dynamic check.

josefx
0 replies
1h42m

The goal is to figure out if the optional features work at all. For that a system you control is required.

That the sabotage worked at all relied on the fact that nobody was testing those features.

How secure is a sandbox that may not even exist? Apparently its good enough for most.

hermitdev
2 replies
15h37m

That's kind of the point. It's feature detection code. If the code cleanly compiles, the feature is assumed supported, otherwise, it's assumed not present/functional. This pretty common with autotools. The gotcha here is this innocuous period is not supposed to be syntactically valid. It's meant to be subtle and always disable the feature.

asvitkine
1 replies
14h38m

Shouldn't whatever is depending on xz supporting landlock be verifying that it's the case through blackbox tests or something? Otherwise a check like this even without the bug could end up disabling landlock if e.g. the compiler environment was such that a given header wasn't available...

saagarjha
0 replies
13h49m

Yes, this is intentionally how autoconf is supposed to work.

mewpmewp2
4 replies
16h23m

I think main issue was that it was embedded in the file itself like that. Would have preferred to have it in a separate valid C file with syntax highlighting etc and being parsed from that file.

tsimionescu
3 replies
11h1m

Perhaps, but given how most build systems work, that would complicate things in other ways (since build systems often try to compile all .c files).

throwaway2037
1 replies
8h24m

I am a CMake novice. Is that true for CMake in this example?

jcelerier
0 replies
6h57m

Definitely not, cmake's try_compile function even directly supports being passed a source file.

mewpmewp2
0 replies
5h42m

Put it in a special folder that is ignored from the build.

eru
0 replies
16h4m

Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.

Not sure, you could also just forbid code that's too complex to analyse without going down a rabbit hole. Instead of trying to go down the rabbit hole.

In general, it's hard to analyse arbitrary code. But we don't have to allow arbitrarily complex code when doing code review.

yegle
4 replies
19h9m

In Python, you only need to indent the `def test_foo` by an additional whitespace, to make it a locally scoped function instead of a proper test function.

eru
3 replies
16h9m

No, not if you have any sane linter or formatter involved. They wouldn't let you get away with indenting by a single space, but only by multiples of whatever you normally use in the rest of your program.

jrockway
2 replies
14h50m

I mean, some CI system should be checking that "if_code_compiles()" blocks compile somewhere. It should be an error until the CI system has that header and can test both variants.

People are really quick to add optionality like this without understanding the maintenance cost. (Every boolean feature flag increases the number of variants you need to test by 2!) Either make a decision, or check that both sides work. Don't let people check in dead code.

asddubs
1 replies
10h48m

multiplies it by 2, not increases

jrockway
0 replies
35m

yup, meant to say "a factor of 2".

patrakov
35 replies
15h15m

Even more evil would have been to replace this line

    (void)SYS_landlock_create_ruleset;
with this:

    (void)SYS_landloсk_create_ruleset;

Terr_
19 replies
15h5m

For those squinting, the "landlock" regular "c" is replaced with a Cyrillic U+0441.

SV_BubbleTime
17 replies
14h26m

Is there a GCC option to error on non-standard English characters?

masklinn
8 replies
12h2m

Error-ing is the point here and what the period achieved. It’s a feature detection snippet so if it fails to compile the feature is disabled.

rmast
7 replies
11h24m

It seems like there should be a way to catch these types of “bugs” - some form of dynamic analysis tool that extracts the feature detection code snippets and tries to compile them; if they fail for something like a syntax error, flag it as a broken check.

Expanding macros on different OSes could complicate things though, and determining what flags to build the feature check code with — so perhaps filtering based on the type of error would be best done as part of the build system functionality for doing the feature checking.

masklinn
3 replies
11h3m

if they fail for something like a syntax error, flag it as a broken check.

A syntax error might be exactly what they’re looking for e.g. they’re feature testing a new bit of syntax or a compiler extension.

so perhaps filtering based on the type of error would be best done as part of the build system functionality for doing the feature checking.

Which would require every compiler to have detailed, consistent, and machine-readable failure reporting.

rmast
1 replies
3h24m

At least for newer C++ standards it seems like there is decent support for feature test macros, which could reduce the need for a feature check involving compiling a snippet of test code to decide if a feature is available: https://en.cppreference.com/w/cpp/feature_test

Handling the output from several of the most recent GCC and Clang versions would probably cover the majority of cases, and add in MSVC for Windows. If the output isn’t from a recognized compiler or doesn’t match expectations, the only option is falling back to current behavior. Not ideal, but better than the current status quo…

estebank
0 replies
1h32m

That would only work for projects that care only about current compilers, where C in general has more desire to support niche compilers.

A mitigation here would be to make result of autoconf only provide instructions for humans to change their build config, instead of doing it for them silently. The latter is an anti-pattern.

FWIW, the approach you propose is how the UI tests for rustc work, with checks for specific annotations on specific lines, but those have the benefit of being tied to a single implementation/version and modified in tandem with the app. Unless all compilers could be made to provide reasonable machine readable output for errors, doing that for this use case isn't workable.

humanrebar
0 replies
6h6m

Universal support of SARIF by compilers would be most of that.

humanrebar
2 replies
6h7m

I'd prefer if the ecosystem standardized on some dependency management primitives so critical projects aren't expected to invent buggy and insecure hacks ("does this strong parse?") in order to accurately add dependencies.

rmast
1 replies
3h35m

It would be interesting to see what the most common compile feature checks are for, and see what alternative ways could be used to make the same information available to a build system — it seems like any solution that requires libraries being updated to “export” information on the features they provide would have difficulties getting adoption (and not be backwards compatible with older versions of desired dependencies).

estebank
0 replies
1h38m

From my experience looking at rust builds, Pareto applies here: most checks are trivial and used by a lot, and a handful are super complex and niche.

pajko
1 replies
7h50m

Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any. That would help catching the replacement of a 'c' character with a Russian one.

Btw, the -fno-extended-identifiers compiler parameter gives an error if UTF-8 identifiers are used in the code: <source>:3:11: error: stray '\317' in program float <U+03C9><U+2083> = 0.5f;

Terr_
0 replies
59m

Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any.

Maybe in the future more languages/ tools will have the concept of per-project character sets, as opposed to trying to wrangle all possible Unicode ambiguity problems.

I suppose then the problem is how to permit exceptions when integrating with some library written in another (human) language.

Terr_
1 replies
14h9m

Disclosure: I've got zero C/C++ on my resume. I was asked to diagnose a kernel panic and backport kernel security patches once, but it was very uncomfortable. ("Hey, Terr knows the build system, that's close enough, right?")

That said, perhaps something like disabling the default -fextended-identifiers [0], and enabling the -Wbidi-chars [1] warning.

[0] https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html...

[1] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#inde...

rurban
0 replies
10h49m

Not yet. I was working on the standardization for C23, but this was postponed to C26, and has not much support. MSVC and SDCC liked it, CLANG and GC not so much.

Here is my bad variant of the feature, via confusables: https://github.com/rurban/gcc/tree/homoglyph-pr103027

The better variant would be to use my libu8ident, following UTR 39. I only did that for binutils.

patrakov
0 replies
3h58m

Well - that would be exactly the point of the attacker. GCC errors out, and it does not matter whether this is because the intentionally typoed header does not exist or because non-English characters are not allowed. Errors encountered during compilation of test programs go to /dev/null anyway, as they are expected not to compile successfully on some systems - that's exactly the point of the test. So no, this option would not have helped.

jcelerier
0 replies
7h3m

I mean, GCC erroring is how the exploit works here. cmake tries to compile the source code: if it works then the feature is available, if it fails then the feature is not available. Forcing the build to fail is what the attacker wants.

fweimer
0 replies
9h49m

The -fno-extended-identifiers option seems to do something in this area, but I don't know if it is sufficient. But it may block some characters which are used in standard English (for some values of standard).

ak39
0 replies
6h6m

Yip, this was impressive. Only a copy & paste into VSCode (Windows) revealed the rectangle around the "c" in the second line.

Impressive.

signa11
6 replies
15h7m

compiler would perhaps "see" it ?

josephg
5 replies
14h18m

Sure; but so long as the compiler error is silently discarded (as it is here), the configure script will assume landlock isn't available and never use it.

smashed
4 replies
13h1m

A misplaced punctuation has some plausible deniability. Like the author could say he was distracted and fat fingered nonsense, honest mistake.

A utf character from a language that has zero chance of being mapped to your programmer's keyboard in the middle of a code line, that would be obvious intentional tampering or at the very least raise some eyebrows.

yeputons
1 replies
9h39m

A utf character from a language that has zero chance of being mapped to your programmer's keyboard in the middle of a code line, that would be obvious intentional tampering or at the very least raise some eyebrows.

I accidentally get Cyrillic "с" in my code few times a year. It's on the same key as Latin "c", so if I switch keyboard layout, I don't notice until the compiler warns me. Easy to do if I switch between chats and coding. Now my habit is to always type a few characters in addition to "c" to check what layout I'm _really_ using.

Granted, it's easier with a one-letter variable called "c", but with longer names I can easily see myself not typing "c" the first time (e.g. not pressing hard enough), starting build, chatting in some windows, getting back, facepalming, "fixing" the error by adding "c" or "с" depending on my keyboard layout.

Bulat_Ziganshin
0 replies
8h54m

even worse, I have Punto switcher that automatically switches language when I start typing. With default config, it changes latin c to russian one because russian language includes word "c" while it's non-sense in English.

and since it's the only Cyrillic character that's placed on the same key as the same-looking english character, I don't even see the problem with my eyes when the autoreplacement fires

skywhopper
1 replies
8h31m

I mean, I agree that the punctuation mishap is a better cover story, but why would any particular language have “zero chance” of being mapped to the programmer’s keyboard?

deanishe
0 replies
4h33m

why would any particular language have “zero chance” of being mapped to the programmer’s keyboard?

You left out a very important qualifier: in the middle of a code line

The chances of someone accidentally fat-fingering a keyboard layout switch, typing one character, then switching back again without noticing while typing a line of code are very slight indeed.

Plus the fact that you'd need to speak a language written in that alphabet to have a plausible reason for having the keyboard layout enabled in the first place.

kaashif
3 replies
4h15m

With the . the author can claim it was unintentional. It's impossible (or at least very hard) to claim the Cyrillic c is unintentional.

To me, that makes the . more evil.

patrakov
1 replies
4h8m

Well, on Russian keyboards, they are on the same key.

kaashif
0 replies
40m

Good point - it might be plausible if the author actually were Russian, but they aren't.

cocoa19
0 replies
2h39m

Plausible deniability

saagarjha
1 replies
13h50m

Putting random Unicode confusables in source code would be far easier to consider malicious

mgaunard
0 replies
7h18m

Except many people have a policy of ASCII only in source code and therefore would catch it immediately.

dathinab
0 replies
9h38m

After the concept of such attacks had blow up idk. 1?2?3? years ago a lot of places now warn at least when mixing "languages"/ranges or sometimes on any usage of non us-ascii printable characters.

So stuff like that is increasingly more reliable caught.

Including by 3rd parties doing any ad-hoc scanning.

Through adding `compiles check` fail with syntax errors definitely should be added tot he list of auto scan checks, at least in C/C++ (it's not something you would do in most languages tbh. even in C it seems to be an antipatters).

Etheryte
0 replies
6h26m

Nearly every modern IDE and diff viewer instantly highlights this though? I doubt this would get far.

dathinab
4 replies
9h42m

It's also iffy tbh. that the compilation check functionality:

- doesn't force users to differentiate between syntax errors and other errors (e.g. symbols not found). Partially you can't even make it work properly if you want due to C macros.

- it seems sensible, tbh. if "does compile" checks for compatibility seems sensible then there is so much wrong with the ecosystem in so many ways

mysidia
3 replies
8h33m

It's standard autoconf stuff, BUT... the right thing to do for a security option should be to throw an error at the configure stage If the Option is requested but can't be supported, And not to silently turn it off.

That is because a human should have to Look at the build and manually make the decision to switch off a security feature from the build.

lazide
2 replies
4h9m

Autocomf output is so incredibly noisy though, still long odds someone would notice anytime soon.

dullcrisp
1 replies
49m

If you break the build they’ll notice

lazide
0 replies
14m

Good thing they disabled the check that would have broken the build/tests in this case eh?

sidewndr46
0 replies
1d4h

I agree. The way I read this, I missed it completely until I searched for it. I was looking too closely at the rest of the code around defines

kevincox
0 replies
3h19m

It also shows how brittle these sorts of checks are in general. I really think that this auto-disable of features based on compilation was a big mistake. It is not only very annoying when packaging things (Why does feature X not work? Because you forgot to include libY in the build environment of course) but they also fail silently even if intended. While this `.` was almost certainly malicious it is easy to imagine an accidental break. Maybe a minor typo or `-Werror`is enabled any one of these functions was marked as must use or something similar.

It seems like it is infinitely better to have a default set of features and explicitly require the user to say `--enable-x --disable-y` as needed. Then error out if a required library isn't available. (Preferably saying which feature requires it.)

kristopolous
22 replies
16h14m

This has plausible deniability on it.

There's better ways to hide by swapping in Unicode lookalike characters. Some of them even pixel match depending on the font.

Maybe I'm out of the loop but is intentionality settled here?

n2d4
13 replies
16h6m

Unicode lookalikes would be detected by IDEs and other tools.

There would be plausible deniability in a different situation, but this is the same author who implemented the backdoor and several similar changes that disable security features. I don't think the benefit of doubt is deserved here.

kristopolous
9 replies
15h24m

Ok there's a larger question here about the bazaar software development method.

How can we ensure say, that Microsoft doesn't pay someone to throw a wrench in libre office development or Adobe to sabotage Gimp?

There's lots of deception strategies for bad faith actors and given the paucity of people who actually do the work, it's really really hard to be picky.

Especially with the complexity of library dependencies. Defect-at-a-distance may actually be a common strategy and this is the first that was caught

Dalewyn
8 replies
14h51m

How can we ensure say, that Microsoft doesn't pay someone to throw a wrench in libre office development or Adobe to sabotage Gimp?

Microsoft and Adobe have reputations to uphold long into the future.

Is that infallible? Hell no it isn't, but consider that Jia Tan only needed to uphold his reputation insofar as getting his backdoor onto everyone's systems. Once that is done, his reputation or the lack thereof becomes a non-issue because his work is done. We're lucky we caught him just before the finish line.

The likelihood of demonstrably reputable parties like Microsoft and Adobe poisoning the well is practically nil because they don't have a finish line.

kristopolous
5 replies
14h22m

Those companies are famous for skullduggery.

They can secure the dominance of their offering against the open source competition for well under 500k a year. It's a no brainer.

What this might look like would be say, poorly discernable icons, clumsy UI design, or an unstable API that makes plugins constantly break. Large volumes of documentation that are inadequate or inaccurate in critical places, etc.

If I was malicious I'd pay someone to write otherwise functional code and bug fixes but make really user hostile decisions whenever possible.

We should be diligent for this in a few key projects. Tech companies could easily be doing this already.

yread
1 replies
8h19m

1. Identify particularly unproductive employees. The people who relly muck things up.

2. Make them contribute to a competing open source project.

3. Profit!

Rexxar
0 replies
3h44m

3. Discover the employee was not unproductive because he was bad but because of internal bureaucracy

lazide
0 replies
4h4m

It would be hard for them to not get caught with their hands in the cookie jar - corp employees have high turnover, and low loyalty past their employment.

Even paying a cutout consulting company to do it would be iffy since so many finance employees would be involved in paying them, it would leak sooner than later - or at least raise a lot of questions that would be hard to answer legitimately. Being a public company, also hard to avoid auditor scrutiny.

Even a private company would have an easier time.

Nation state actors though? No such issues - that’s literally why security clearances, compartmentalization, and ‘you never really leave’ are done the way they are.

And they don’t have to worry about market blowback unless they get really excessively heavy handed (like France did in the late 80’s-90’s). They are setup to do shady stuff, it’s their raison d'etre.

[https://en.m.wikipedia.org/wiki/Industrial_espionage#:~:text....]

bawolff
0 replies
13h14m

Those companies are famous for skullduggery

There are levels of skull duggery. Hiring someone to pretend to work for a competitor while secretly sabotaging them is a whole other level of skullduggery with a lot of liability attached. I don't think that would be worth it to them.

Dalewyn
0 replies
11h36m

Well since you seemingly want to paint Microsoft and other such companies in a bad light, let me point out to you that it's actually Microsoft who brought awareness to this very problem: Andres Freund works at Microsoft.[1][2]

It is probably prudent for you (and other like-minded individuals) to be more careful who you think your enemies really are. Infighting against friends and allies, or even neutral parties, is exactly what your real enemies would want you to do.

[1]: https://www.linkedin.com/in/andres-freund

[2]: https://twitter.com/AndresFreundTec

deckard1
1 replies
13h18m

You say all this after recent documents were revealed about Facebook intercepting and analyzing Snapchat encrypted traffic via a man-in-the-middle conspiracy.

esafak
0 replies
13h14m

What kind of a reputation do you think Facebook has?

asveikau
2 replies
15h16m

I'm not sure that an IDE will catch a syntax error in C code quoted inside a cmake script trying to test if things compile.

n2d4
0 replies
14h57m

A lot of IDE configurations, such as VSCode on the default config, highlight Unicode characters that are invisible/look like others (eg. cyrillic characters or non-breaking space) or even all non-ASCII characters, regardless of whether they're in a string or not.

Try pasting any of these into your IDE and see if it catches it. https://gist.github.com/StevenACoffman/a5f6f682d94e38ed80418...

brailsafe
0 replies
15h4m

In either case, it would appear as added code, not different characters in the same code, so maybe about the same. If someone changed one character in an existing string though, I think it would be more likely caught visually by someone used to seeing accidental garbage show up

zzzeek
6 replies
16h7m

the period right there on the left edge? if I saw that in a patch I'd be through the roof, that looks completely intentional

throwaway2037
1 replies
8h19m

I love this type of hindsight to 20-10 comment. "If I saw...". That is a BIG if. Plenty of smart people on that mailing also missed it. I missed it myself when I opened the HN lead link. Very subtle.

zzzeek
0 replies
4h11m

I dont use mailing lists for patches, I use gerrit that would put a great big highlight over that character

renewiltord
0 replies
3h9m

Double spacebar gives you period in some contexts on Mac.

kristopolous
0 replies
15h31m

Not me. Maybe someone is using an editor with a . key mapped to something. It's in a pretty convenient place.

:wq!

aeyes
0 replies
15h43m

I'd never suspect this to be intentional if I'd spot it in a patch, even given the consequences in this particular case. I have written and committed things into code instead of writing it into some other window several times. Without a linter I probably wouldn't spot an extra dot when reviewing my own change before sending it out.

Tuna-Fish
0 replies
3h27m

I distinctly remember having to remove such a superfluous . that I accidentally added into a file on multiple occasions.

If you are using vi and your left shift key is dodgy, that can easily happen by accident.

Tuna-Fish
0 replies
3h30m

Maybe I'm out of the loop but is intentionality settled here?

Yes. The exploit used a payload, that was stored in the tests directory as a binary compression test case, but which is very clearly a very intentional exploit payload.

All the sneaky stuff was about deploying that payload without it being obvious.

jstanley
15 replies
1d4h

What does the dot do?

Aurornis
8 replies
1d4h

The function is “check_c_source_compiles”. The comment indicates that the intention is to confirm that the Landlock functionality can be compiled on the system, in which case it will be enabled.

The stray dot isn’t valid C, so it will never compile. By ensuring it can never compile, Landlock will never be enabled.

astrange
5 replies
13h26m

configure would print that it's not enabled, so it seems like the kind of thing people would eventually notice.

masklinn
3 replies
11h53m

Maybe, eventually, but how many people read the reams of garbage autoconf spouts out until the feature they wanted fails to materialize?

makomk
1 replies
6h45m

The actual compiler output from the autoconf feature test is one of the things I'd probably look at fairly early on if some feature is disabled when it shouldn't be, but I maybe have a bit more experience running into problems with this than younger folks.

masklinn
0 replies
6h15m

if some feature is disabled when it shouldn't be

That's my point, you're going to look at autoconf's output if a feature you're expecting is missing.

But would you think to have a test or expectation for landlock in xz? How would you even check if landlock is enabled for a process?

astrange
0 replies
10h6m

I used to and it's actually how I got a large part of my computer skills, but unfortunately I got medicated for ADHD and became normal and can't do it anymore.

But I still think reading boring logs is a great way to understand a system. Turning on all the intermediates for a compiler (llvm ghc etc) is very educational too.

takeda
0 replies
11h19m

Looks like he only introduced that "bug" in CMake. Was there migration in progress from autoconf to cmake?

ak39
1 replies
5h55m

Shouldn't there be a unit test to confirm landlock is on/off? (I mean, this seems a crucial aspect of the code which needs 100% test coverage.)

patrakov
0 replies
3h46m

This is not something that a unit test can catch. First, this 100% coverage rule applies to the program/library code, and only to the subset that is not ifdeffed out (otherwise, you will not be able to have, e.g., Windows-specific code), and definitely not to the build system's code. Second, how would you test that landlock works, in a unit test, when this feature is optional and depends on the system headers being recent enough? You can't fail a unit test just because the software is being compiled on an old but still supported system, so it would be a "SKIPPED" result at best, which is not a failure and which is not normally caught.

thrtythreeforty
5 replies
1d4h

Fails the build, so that Landlock support is never enabled.

usr1106
3 replies
13h24m

I don't think it fails the build. It's part of a test, trying to compile a bit of code. If it compiles the test is true and a certain feature is enabled. If the compilation fails the the feature is disabled.

This is quite common in build systems. I had such a test produce incorrect results in the kernel build system recently. The problem is that the tests should really look carefully for an expected error message. If the compilation fails with the expected message the test result is false. If the compilation fails for some other reason the build should fail so the developer is forced to investigate.

Disclaimer: I have not studied the xz build system in detail, just what I saw from the diff plus a bit of extrapolation.

jcelerier
1 replies
6h50m

Problem is that every compiler/compiler version will have different messages for the same error. And xz is the kind of project that gets built on really really random homegrown compilers. Maybe we need an option to make the compilers output a standard JSON or whatever...

MereInterest
0 replies
4h53m

Maybe we need an option to make the compilers output a standard JSON or whatever...

While I like the idea, I fear this would go the same route as UserAgent strings, with everybody giving carefully crafted lies. The main issue is that bugs are only known in retrospect.

As an example, suppose Compiler A and Compiler B both implement Feature X. Therefore, they both produce JSON output indicating that they support Feature X. A year later, developers for Compiler A find and fix a bug in its implementation of Feature X. Now, how do we handle that case?

* Update the version for Feature X as by Compiler A? Now Compiler B is publishing the same version as the buggy Compiler A did, and would get marked as not having a usable implementation.

* Update the spec, forcing a version update for Feature X in both Compiler A and Compiler B? Now a bugfix in one compiler requires code updates in others.

* In the build system, override the JSON output from Compiler A, such that Feature X is marked as unavailable for the buggy versions? Now the JSON is no longer trustable, as it requires maintaining a list of exceptions.

* Try to trigger the bug that Compiler A just fixed, so that the build system can recognize the buggy versions, regardless of the JSON output? Now we're back to the starting point, with features determined based on attempted compilation.

akdev1l
0 replies
12h14m

It fails that small build that is testing for landlock functionality. Hence it doesn’t build the support for it.

It doesn’t fail the overall build.

surajrmal
0 replies
3h19m

It feels like this is a poor way to determine if a feature should be enabled. There are just too many reasons it could fail, and default off on compilation failure just seems like a glaring security bug. If this is a common practice, it seems like the community should rethink this choice.

asveikau
10 replies
18h59m

It seems like Lasse Collin is back on the scene and maybe pissed?

Dalewyn
8 replies
17h20m

If he is innocent and a victim as much as everyone else in all this, I won't blame him for wanting blood.

Most of us are humans after all, and being social creatures we tend to take violations of trust quite deeply.

danielhlockard
7 replies
17h8m

Truly seems that way currently. He said he'd really dig in starting next week and just checked his email on vacation and saw this whole mess.

asveikau
6 replies
17h3m

It may be hard for him to re-establish trust. Maintaining xz for more than a decade then doing this would be quite a "long con" but if HN threads are any indication, many will still be suspicious.

His commits on these links look legit to me. It's a sad situation for him if he wasn't involved.

londons_explore
4 replies
11h26m

The fact GitHub suspended his account too suggests that they might have info saying he is involved.

asveikau
1 replies
3h35m

I think it was somewhat irresponsible to block everything. It hampers instigation of the repo's history. It's good that Lasse had another mirror.

asveikau
0 replies
1h26m

Woops, too late to edit this comment and say *investigation

sReinwald
0 replies
9h31m

Personally, I doubt that. I would assume that GitHub just banned all accounts that were directly associated with the project as a precaution.

cricalix
0 replies
10h51m

Or you know, they just reacted with a ban hammer on all accounts related to xz and to heck with innocence.

delfinom
0 replies
3h55m

Honestly, he should call it quits and just drop xz utils and move on with life. He maintained it for 15 years and struggled to get anyone else to help until Jia showed up. Meanwhile the Linux ecosystem depended on it.

The Linux ecosystem will either figure shit out and maintain it or move into a different archive library.

TLLtchvL8KZ
0 replies
11h1m

I worry about his home/personal system, this guy seems to have befriended Lasse in order to get the position, is it possible Lasse himself has been compromised?

I'd be nuking everything, possibly even disposing of hardware and setting up a new verifiable online identity on a clean system/hardware.

Aeolun
5 replies
17h50m

I feel like my version control system is better about highlighting the changed characters than these solid green or red strings.

XorNot
1 replies
17h35m

I don't love unified diffs as a rule either. They're very noisy to read in general.

A side by side in something like "meld" will highlight changes but also means you're reading the full context as it will exist in the code base.

My number one complaint about "code review" is the number of people who simply run their eyes over the diff (in private orgs) because management says "code review" and that's as far as they ever take that in terms of defining outcomes.

eru
0 replies
16h2m

I love unified diffs for many applications. You are right that side by side diffs have their uses, too.

TheRealPomax
1 replies
17h47m

It's git, so it can show a vastly better diff, just not from a URL with hardcoded diff settings.

Aeolun
0 replies
17h35m

Yeah, but I’m wondering where the code review was done. In a different context, would this be easier to spot?

masklinn
0 replies
11h58m

It’s a giant block of new code, what is it supposed to do beyond tell you it’s a giant block of new code?

Note that this is C code inside if a cmake string, even if your diff can do highlighting the odds it would highlight that are low, and if it did highlighting is generally just lexing so there wouldn’t be much to show.

undebuggable
0 replies
1d3h

This is so vile that even if caught red-handed during PR one could shrug off "oh, my IDE's auto formatting did this".

eacapeisfutuile
29 replies
15h42m

Why is that accepted? Serious question

kristjansson
19 replies
15h10m

He had unfettered access to xz’s git?

eacapeisfutuile
17 replies
15h4m

Isn’t it a bit ironic with how much code everyone depends on that can freely be altered by some unknown party, while so much time goes into code reviews to verify internal changes at most companies.

deckard1
6 replies
13h3m

you can have ten comments about the name of a variable, but no one bats an eye at a new npm package introduced. Also, devs that wrote code that Google depends on can't pass the leetcode gate check to get a job there.

Our industry is a laughingstock.

throwaway2037
3 replies
8h3m

The last sentence is an overreach to me, but I have experienced much of the same bike-shedding during code reviews. 95% of them are useless. Read that twice; I am not joking, sadly. I am not against code reviews, but in my experience(!), the reviewers are not incentivized to do a thorough job. Seriously, if you deliver a new feature vs do a deep, difficult code reviews, which one benefits you more? To repeat: I don't like it; I am only pointing out the perverse incentives to rush during code reviews.

One more thing that I don't see enough people talking about here: Writing good software is as much about human relationships (trust!) as it is about the computer code itself. When you have a deeply trusted, highly competent teammate, it is normal to put less effort into the review. And to be clear, I am talking about the 99% scenarios of writing the same old CRUD corporate code that no one gets excited about here. Please don't reply to this post with something like, "Well, I work on Dragon at SpaceX and all branches have 2x code coverage and the QA team has 600 years of combined experience... Yada..."

peteradio
2 replies
5h4m

One 1 hour doing code review is not really stolen from doing feature work really is it? For the vast majority its stolen from playing video games or some other non-work.

eacapeisfutuile
0 replies
2h8m

That is very subjective, and far from what I’ve seen.

Jevon23
0 replies
51m

That heavily depends on the individual developer and the organization in question.

In general, the most highly skilled developers who are most capable of doing a thorough code review are also the ones who are most likely to be genuinely over capacity as is.

surajrmal
1 replies
3h10m

I'm not sure about your experience, but companies which have strong code review practices also have strong controls on the third party code. In terms of review granularity, it makes more sense to be critical of maintenance/readability for code you actually own and maintain. Third party code has a lower bar and should, although I also believe it needs to be reviewed

eacapeisfutuile
0 replies
1h52m

Yeah I think this is the common case. I think we usually trust that dependency A took a look at their dependency B and C before releasing a new version of A. And even if properly reviewing our bump of A, how often do we check out changes in B and C

Edit: yes for FAANG-ish companies this is usually a bit different, for this reason. And licenses..

cookiengineer
4 replies
13h51m

Some might say RMS was right all along.

darthrupert
3 replies
8h12m

I would actually say that he is completely wrong in this case. Open source created this problem.

cookiengineer
1 replies
5h21m

And you think proprietary code doesn't have this problem?

Can you prove it? Where's the evidence? ;)

eacapeisfutuile
0 replies
2h5m

Lack of proof in any direction is approaching the core issue here.

SAI_Peregrinus
0 replies
3h38m

The problem niver would have been fixed in proprietary software. And it's unlikely the problem would have been considered anything more than a 0.5s startup delay in some situations if xz were proprietary; it would have been reported as a performance issue to the malicious maintainer, who would have treated it as such and improved the startup time.

HideousKojima
1 replies
12h44m

while so much time goes into code reviews to verify internal changes at most companies

Maybe at FAANGs, but I work at a massive company and code review is non-existent.

eacapeisfutuile
0 replies
2h2m

I think this almost the default even outside of FAANG, of course there’s companies not applying it.

2pEXgD0fZ5cF
1 replies
9h38m

while so much time goes into code reviews to verify internal changes at most companies

Very easy claim to make. Difficult to verify.

eacapeisfutuile
0 replies
7h22m

Yes, this is anecdotal, but I think there’s a fair share invested into it (from my experience)

takeda
0 replies
11h14m

And he was in the project for just 2 years.

junon
0 replies
3h13m

Only on Github. Not on the Tukaani mirror.

foooorsyth
7 replies
15h37m

Because nobody’s really paying attention.

“LGTM!”

eacapeisfutuile
6 replies
15h33m

Generally yes, but ripping all conditions out of SECURITY.md should at least raise an eyebrow?

indrora
5 replies
15h28m

Nobody was watching. Plain and simple.

If you have commit access to it, and nobody is there to see, nothing stops you.

eacapeisfutuile
4 replies
15h22m

Yes but if that’s the sentiment how is this not as problematic as the npm ecosystem.

snazz
3 replies
15h13m

It’s similarly problematic but on a somewhat smaller scale and with fewer levels of nested dependencies.

eacapeisfutuile
2 replies
14h59m

I’m not sure this would be smaller scale? At least probably too early to tell?

snazz
1 replies
4h3m

I just mean fewer total packages and fewer maintainers. Linux libraries and packages don’t have the culture of making a package out of a single small function and importing it everywhere, which is part of the reason why NPM is a good case study in opportunities for supply chain attacks.

eacapeisfutuile
0 replies
2h0m

Yes but the distribution likely depends on it, making it wider spread even without the middleman dependencies.

jijijijij
26 replies
1d2h

I don't know enough about C and complex builds, but the proposed change appears to be kind of a red flag, even without the breaking dot.

    -    check_include_file(linux/landlock.h HAVE_LINUX_LANDLOCK_H)
    ...
    +    check_c_source_compiles("
    +        #include <linux/landlock.h
Is a compilation-test a legitimate/common/typical method to go about this?

Independently, of the breaking code, to me it seems accidental failing, or even accidentally not failing, would be in the nature of such an assessment... So, this commit seems to raise the question of "why?", even if you missed the dot, doesn't it? If a feature is formally available, but effectively broken somehow, wouldn't you want the compiler to complain, instead of the feature dropped silently? Is the reasoning in the code comment sound? Can you test, if syscalls are defined in another way?

ronsor
25 replies
1d1h

Is a compilation-test a legitimate/common/typical method to go about this?

Yes—in fact, compilation tests are often the only way you can tell if a feature actually works. It's extremely common for C build systems to detect and work around weird systems.

jijijijij
23 replies
1d1h

Is this by design, or by legacy? I mean, is there a better way to do this? Seems really flawed to me.

ninkendo
9 replies
19h18m

It’s by design. The job of autotools is to find “ground truth” about whatever environment you’re compiling against. It’s meant to discover if you can use a feature by actually seeing whether it works, not just by allow-listing a known set of compiler or library versions. This is because the whole point is to allow porting code to any environment where it’ll work, even on compilers you don’t know about. Think back to a time when there were several dozen Unix vendors, and just as many compilers. You don’t want your build script to report it can’t compile something just because it isn’t aware of your particular Unix vendor… you want it to only fail if the thing it’s trying to do actually doesn’t work. The only way to do this is by just testing if certain code compiles and produces the expected result.

Aeolun
8 replies
17h44m

I don’t know. In my code I’d always compile and check at runtime?

ninkendo
3 replies
17h16m

Some of the checks are there to tell whether or not the compiler even supports your code. You may not be able to compile your code at all, and the job of the build system is sometimes to just emit useful errors to help the person building the code to understand that they need a compiler which supports [language feature X].

Again, this is intended to be portable software. It is designed to work on lots of OS’s, with lots of compilers, in a lot of future environments that don’t even exist yet.

If you have a security feature for example, which uses the pledge() syscall on OpenBSD, but you can only use that feature on OpenBSD systems, you have two choices:

- Conditionally compile it based on whether you’ve detected that this is an OpenBSD target at build time, or,

- Conditionally compile it based on whether some sample code that uses pledge() builds successfully.

You can’t defer this decision until runtime, because it would require linking to pledge() symbols even though they may not exist on this system, which would cause the executable to fail to link at runtime, unless you completely rearchitected to use a plugin model, which is overkill.

So given the above are your main two options, the latter is preferred mainly because it allows new systems to come in and be compatible with old ones (maybe someone adds pledge() support to Linux one day) without having to fudge the uname command or something. This was super important in the early Unix days… perhaps less so now, but it’s still a good way to write portable software that still can take advantage of platform-specific features.

jiggawatts
2 replies
13h8m

Again, this is intended to be portable software.

A scathing criticism of the OpenSSL library by the BSD team was that it was too portable in a (very real) sense that it wasn't even written in "C" any more, or targeting "libc" as the standard library. It would be more accurate to say that it was "Autotools/C" instead. By rewriting OpenSSL to target an actual full-featured libc, they found dozens of other bugs, including a bunch of memory issues other than the famous Heartbleed bug.

Platform standards like the C++ std library, libc, etc... are supposed to be the interface against which we write software. Giving that up and programming against megabytes of macros and Autotools scripts is basically saying that C isn't a standard at all, but Autotools is.

Then just admit it, and say that you're programming in the Autotools standard framework. Be honest about it, because you'll then see the world in a different way. For example, you'll suddenly understand why it's so hard to get away from Autotools. It's not because "stuff is broken", but because it's the programming language framework you and everyone else is using. It's like a C++ guy lamenting that he needs gcc everywhere and can't go back to a pure C compiler.

Job ads should be saying: "Autotools programmer with 5 years experience" instead of "C programmer". It would be more accurate.

PS: I judge languages by the weight of their build overhead in relation to useful code. I've seen C libraries with two functions that had on the order of 50kb macros to enable them to build and interface with other things.

tsimionescu
1 replies
10h29m

C basically has no standard library. It's no surprise to anyone who has ever used it more than in passing that you depend on the chosen build system to replace that. Building portable C libraries is very different because of this from any other commonly used programming language - even C++.

kzrdude
0 replies
2h46m

It's ironic to me to say that C has no standard library while at the same time libc is one of the most important libraries that most programs on an installed system can't go without.

So it has one, but it's small. It has a few useful functions, for example system(const char*) which was used by the exploit.

Denvercoder9
2 replies
17h37m

Not everything is detectable at runtime, such as syscall numbers (which is what's being tested for here).

manwe150
1 replies
14h6m

Runtime syscall number detection is very common in practice, since the kernel returns ENOSYS to enable that exact ability for glibc and other shim libraries

Denvercoder9
0 replies
4h33m

That's syscall for _availability_ detection, not its number. There's no way to ask the kernel "what's the number of the syscall commonly known as read()".

ksherlock
0 replies
17h9m

You'll never make it to runtime if you try to include headers that don't exist. You'll never make it to runtime if you try to link in libraries that don't exist.

db48x
9 replies
18h57m

It’s by “design”, in the sense that C and C++ provide no better way to really know for sure that the functions you want to call really exist. In more modern languages we rely on metadata and semver, but none of that exists for C and C++.

Denvercoder9
3 replies
16h48m

That only works for language features though, it doesn't allow detecting OS/library features.

db48x
1 replies
13h15m

Very true, but don’t forget that Autoconf checks for “interesting” compiler choices as well as library and OS features. And then there is libtool, which abstracts out the differences between how compilers generate shared libraries so that you only have to understand one way of doing it and it will work on all of them.

cryptonector
0 replies
13h5m

Autoconf lets you check for all sorts of things:

  - instruction set architecture
  - OS versions
  - ABIs
  - libraries (whether they are installed)
    - and what functionality they provide
  - commands/executables
  - anything you can write a macro to check
All stuff too disparate to reliably have the OS be able to answer every question you might have about it and the stuff installed on it. You can't wait for any such system to learn how to answer the questions you might have about it, so some things you can only detect, either at build configuration time, build time, or run time.

uecker
3 replies
18h45m

I never test for this in this way in my C projects. I also rely on version tests only.

db48x
2 replies
18h4m

Yea, it’s not impossible to do, just not standardized. If you use a library and it provides useful version information, then definitely use it. It’s just that the language or the tooling doesn’t force libraries to have that kind of metadata. Compare that with Rust, where every library you use must come with a standardized manifest that includes a version number, and where they tell library authors up front that they are expected to follow the semver convention.

The fact is that things have good easier over the last 10 or 20 years. It used to be the case that any program targeting Unix had to either spend a lot of time and energy tracking the precise differences between dozens of different commercial Unices, or use autoconf. Autoconf was the project that combined all of that lore into a single place, so that most people didn’t have to know every single detail. But these days most of the Unices are dead and buried, and 99% of all new projects just target Linux. Kinda sucks if you prefer one of the BSDs, or OpenSolaris/Illumos/SmartOS, but it does mean that new Linux developers never have to jump through those hoops and simply never learn about autoconf. And while on the one hand that represents a loss of knowledge and competency for the community (making this type of supply–chain attack much easier), on the other hand autoconf is in practice an abomination. It is (or at least was) extremely useful, but it was implemented in M4 and reading the source code will literally damage your brain.

acdha
1 replies
16h33m

It used to be the case that any program targeting Unix had to either spend a lot of time and energy tracking the precise differences between dozens of different commercial Unices, or use autoconf. Autoconf was the project that combined all of that lore into a single place, so that most people didn’t have to know every single detail.

As a data point, the place I worked for in the mid-90s had a single codebase with something over 600 permutations of supported OS and compiler when you included the different versions. One thing we take for granted now is how easy it is to get and install updates – back then you might find that, say, a common API was simply broken on one particular operating system version but your customers would have to wait for a patch to be released, sometimes purchased, put onto floppy disks or CD-ROM, and then manually installed in a way which had enough risk involved that people often put it off as long as they could. Some vendors also did individual patches which could be combined by the sysadmin so you had to test for that specific feature rather than just saying “is it greater than 1.2.3?”, and it wasn’t uncommon to find cases where they’d compiled a common library with some weird patches so you had to test whether the specific features you needed functioned.

Part of why Linux annihilated them was cost but much of it was having package managers designed by grownups - I remember as late as the mid-2000s bricking brand new Sun servers by running the new Solaris updater, which left them in an unbootable state.

secondcoming
0 replies
19h7m

Back in the day when people compiled source from tarballs on their personal machines, the autoconf script would query your system to see what functionality it supported. It did this by trying to compile small programs for each feature. If the compilation failed it assumes the feature is unavailable on your system and a flag is set/unset for the rest of the build.

kccqzy
0 replies
19h4m

Have you ever tried to manually build something from a release tarball by starting with ./configure? If so, have you observed how many times the compiler is invoked in this configure phase before you even run make?

cryptonector
0 replies
17h37m

It's not by design, unlike what siblings say. It's by accident (so, "legacy", as you put it).

The problem is that already in the 80s there was tons of variability from on Unix system (or version of it) to the next, but there was no standard way of representing what features/standards/APIs/libraries a system supported or had installed. When faced with such a mess people wrote code to detect features are present on the target host.

This then got made into tools with libraries of detection code. Think autoconf/autotools.

Now we also have pkgconfig, but it's too late and it was not enough anyways.

Some things you might only detect at run-time, provided that their ABIs are stable enough that you can copy their headers into your application.

Nezghul
0 replies
9h41m

That "check_c_source_compiles" function should first test if the provided code snipped is "valid C code in general" and only then check if it compiles in given system.

karmakaze
11 replies
1d3h

I can't believe that system security is dependent on such a loose chain of correctness. Any number of things could have stopped this.

  +   # A compile check is done here because some systems have
  +   # linux/landlock.h, but do not have the syscalls defined
  +   # in order to actually use Linux Landlock.
Fix those headers on those systems to explicitly opt-out. What's the point of headers if they don't declare their capabilities?

Also why isn't there a single test after a binary blob (even when compiled from open source) is made to ensure security is in-tact?

I wouldn't even ship a website checkout without end-to-end tests for core capabilities. There must be a priority misalignment of adding features > stability.

Edit: I hope the 'fix' isn't to remove the '.'--I just saw the other post on HN that shows removing the '.'

missblit
2 replies
17h6m

You are quoting Jia Tan [1]. The malicious actor wrote that comment when deliberately breaking the check in the first place.

Fixing headers or extra tests would not have prevented this, as there is no indication the headers were broken in the first place, and extra tests could have been compromised (or ignored for release tarball) some other way.

[1] https://git.tukaani.org/?p=xz.git;a=commit;h=328c52da8a2bbb8...

manwe150
1 replies
14h12m

Should the better fix then to have been to revert the bad commit with the malicious commit message, rather than just deleting the dot (as was done)?

humanrebar
0 replies
5h55m

The better fix would be to move to proper dependency management and depend on a version range of a dependency instead of hoping you can do a better job of modelling the same with strings hardcoded into a CMakeLists.txt.

kardos
2 replies
1d1h

Well, do we know if that commented code you quoted is accurate?

karmakaze
1 replies
23h16m

It's the content of TFA, so flag it if you believe it isn't.

kardos
0 replies
21h51m

Yes that is what I mean -- the commit in TFA is from the bad actor -- so the quoted comment is suspect..

pas
0 replies
17h51m

... there's a huge bias in things that get tested (and how they get tested). easy to test things get tested. there's a lot more developer, committer, maintainer for web stuff. there's a cultural issue, it's hard to improve on old ossified processes/projects, etc.

deathanatos
0 replies
14h29m

I'm pretty certain the comment isn't accurate, and is just more subterfuge.

Randalthorro
0 replies
15h47m

He introduced a whole new testing framework then slowly put a backdoor in it over 2 years

RaisingSpear
0 replies
7h35m

Tangentially related, but I've had a case where GCC 9.4.0 shipped a broken arm_acle.h header [https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100985] - code which includes the header would always fail to compile.

Since users could be trying to compile it on GCC 9.4.0, and as the functionality dependent on the header isn't critical to the application, if the build detects the header is broken, it just disables the functionality and emits a warning.

Back to xz: If security isn't your top priority, it seems reasonable to just ignore the failure of an optional feature and just move along. Of course, in hindsight, it's easy to point fingers, but without that, it doesn't sound completely unreasonable to me.

Hnrobert42
0 replies
17h50m

As a user of open source who doesn’t know enough to make those suggestions, I would be grateful if you would develop and contribute them.

clnhlzmn
10 replies
18h26m

Would it be reasonable to expect that this MR comes along with a test that shows that it does the thing it’s claiming to do? I’m not sure how that would work in this case.. have a test that is run on a system that is known to have landlock that does something to ensure that it’s enabled? Even that could be subverted, but it seems like demanding that kind of thing before merging “features” is a good step.

viraptor
6 replies
18h4m

I'd say this MR is a bad approach in general. The headers say what interfaces are known, not what features are available. You should be able to compile with landlock support on a system which doesn't enable it. Same situation as seccomp and others. Your build machine doesn't have to match the capabilities of the target runner.

But yeah, to test it, you can have a mock version of landlock which responds with the error/success as you want, regardless of what the system would normally do. It relies on the test not being sabotaged too though...

raimue
5 replies
17h55m

Read the code of the check again. It mostly checks that the required SYS_* constants are defined to be able to use the syscalls. You can compile this on a system that does not have landlock enabled in the running kernel, but the libc (which imports the kernel system call interface) has to provide the syscall numbers.

viraptor
4 replies
17h47m

You're right. I didn't see SYS... symbols being actually used, but they are: https://git.tukaani.org/?p=xz.git;a=blob;f=src/xz/sandbox.c;...

This doesn't change my opinion in general - that version should be exposed through a library call and knowing about the specific syscalls shouldn't be needed in xv.

Denvercoder9
3 replies
16h46m

I see your point, but suggesting adding an additional library dependency while we're discussing a supply chain attack is quite ironic.

viraptor
2 replies
16h17m

Should've said function call not library call. My bad. Basically if you already have the linux/landlock.h, that should provide everything you need to do without explicit references to SYS...

raimue
1 replies
5h41m

Now we are running in circles. As you see in the git commit, the compile check was added because the existance of linux/landlock.h alone was not enough to check that the feature can be used.

This header defines the data types for the Linux kernel interface, but not how the syscall landlock_create_ruleset(2) will be issued. That is provided by libc either as a separate wrapper function (does not exist in glibc) or the implementation needs to use syscall(SYS_landlock_create_ruleset, ...), with the constant also being provided by libc. That is how it works for all syscalls and you won't be able to change this.

SAI_Peregrinus
0 replies
3h49m

The only source of the claim that the existence of.linux/landlock.h is insufficient is (AFAICT) the malicious git commit. Why trust the comment, written by the attacker, to explain away a malicious change?

masspro
1 replies
18h8m

I like the idea of testing build-system behaviors like this, and I don’t think it’s ever really done in practice. Scriptable build systems, for lack of a better name for them, exist at a bad intersection of Turing complete, hard to test different cases, hard to reason about, hard to read the build script code, and most of us treating them as “ugh I hope all this stuff works” and if it does “thank god I get to ignore all this stuff for another 6 months”.

azakai
0 replies
14h14m

If you mean testing the "disable Landlock if the headers and syscalls are out of sync" functionality then I agree, workarounds for such corner cases are often not fully tested.

But it would have been enough here to have a test just to see that Landlock works in general. That test would have broken with this commit, because that's what the commit actually does - break all Landlock support.

Based on that it sounds like there wasn't a test for Landlock integration, if I've understood things correctly.

adrianmonk
0 replies
18h10m

Create a tiny fake version of landlock with just the features you're testing for. Since it's only checking for 4 #defines in 3 header files, that's easy.

Then compile your test program against your fake header files (with -Imy-fake-includes). It should compile without errors even if landlock is missing from your actual system.

Then build your test program a second time, this time against the real system headers, to test whether landlock is supported on your system.

usr1106
5 replies
13h12m

Where/how was landlock supposed to be used?

I guess you cannot really use it in a generic library like compression/decompression. The library has no clue what the program is supposed to do and what should be restricted.

For a program it might be clearer.

The sshd attack was using liblzma as a library. So disabling landlock seems unrelated? A sign that there is more bad code waiting to be detected / had been planned to be inserted???

akdev1l
2 replies
12h10m

Presumably sshd itself used to lock down its own capabilities after a certain point of execution.

Removing the landlock means the daemon doesn’t lock itself down and will allow for better payload execution when they get to the exploitation stage.

I don’t think these two things are unrelated. I think they already had payloads in mind and realized this would be a hurdle.

usr1106
0 replies
10h30m

It's pretty impossible to lock down sshd. The restrictions are inherited by all ancestors. What would the sysadmin say if they find themselves in a restricted shell?

jwilk
0 replies
11h24m

No. OpenSSH doesn't use Landlock, and even if it did, this patch wouldn't affect it.

bawolff
1 replies
13h4m

Its weird. Like i would consider doing two unrelated backdoor-esque things in the same project really sloopy. Seems like it just significantly increases the risk of being discovered for minimal gain.

Its very confusing. Parts of this sega seem incredibly sophisticated while other parts seem kind of sloppy.

kijin
0 replies
11h4m

Dude is a developer just like the rest of us. We all try to write clever code from time to time, but at other times write sloppy crap. Especially if there's a boss insisting on a tight deadline.

CGamesPlay
5 replies
16h6m

On an unrelated note, this malware team has assembled a great dataset for training AIs on identifying security problems. Every commit has some security problem, and the open source community will be going through and identifying them. (Thanks, maintainers, for the cleanup work; definitely not fun!)

luyu_wu
1 replies
15h46m

One of the cooler uses of AI I've seen!

wizzwizz4
0 replies
15h5m

Currently it doesn't work, but yeah, it'll be really cool when we have tech like that! (It'll still only be able to detect known vulnerabilities, but we don't often invent new ones.)

chilling
1 replies
13h57m

I hear this stuff for the first time, can you post some info about that?

CGamesPlay
0 replies
10h31m

Like in this article, we have a patch that introduces a security flaw (disabling landlock). We later have a patch that fixes it, specifically. The job of the LLM is to reproduce the fixing patch given the problem patch. Or at the very least, explain that this patch results in landlock always being disabled. To be clear, this problem is much, much harder than the problems LLMs are solving now, requiring knowledge of autotools behavior that isn’t included in the context (identifying that a failed build disables the feature, and that this build always fails).

There was another example where this team submitted a patch that swapped safe_fprintf for fprintf while adding some additional behavior. It was later pointed out that this allows printing invisible characters to the stream, which allows hiding some of the files that are placed when decompressing.

bawolff
0 replies
13h9m

I doubt it will generalize well. At best its just an arms race.

trelane
4 replies
18h47m

I wonder why it was useful to prevent Landlock from being enabled in xz. Perhaps a later stage was to inject malicious content into xz archives? But then why not just inject malicious activity in xz itself?

ColonelPhantom
2 replies
18h20m

Because a deliberate vulnerability is much easier to hide than actual malicious content.

One could probably sneak a buffer overflow or use-after-free into a C project they maintain without being noticed. Actually shipping a trojan is much harder, as observed with the xz-to-sshd backdoor.

trelane
1 replies
17h38m

Ah, so the next stage would have been to add a "bug" in xz that would trigger during the supposedly sandboxed execution, when presented with certain input files. Clever.

puetzk
0 replies
15h44m

Well, is also quite possible that adding such a bug was the previous stage. Or even just having found one that you didn't report/fix...

A1kmm
0 replies
17h57m

Unless there are more subtle backdoors that target xz itself beyond the targeting of ssh. Clearly the aim was to be subtle.

GrayShade
4 replies
1d4h

Only in CMake, this time. Not in the autotools version.

almostnormal
3 replies
17h2m

Does autotools compile only, or try to link? There's no main().

jwilk
0 replies
11h37m

That's not autotools.

jwilk
0 replies
11h36m

AC_COMPILE_IFELSE indeed only compiles, so there's no need for main().

There's AC_LINK_IFELSE if you want to test linking too.

snnn
3 replies
18h22m

So for each optional feature we may need three build options:

1. Force enable 2. Enable if available 3. Force disable

Like,

   --enable_landlock=always
   --enable_landlock
   --disable_landlock

glandium
2 replies
17h58m

My rule of thumb is that things should never be disabled automatically. Make the test hard fail and print a message that the feature can be disabled.

cperciva
0 replies
14h58m

In my code I have a bunch of routines optimized for different platforms, e.g. using x86 AESNI instructions. Not all compilers support them, and they don't even make sense when compiling for a different CPU architecture.

It's much simpler to say "enable this if we can compile it" rather than detecting the compiler and target platform and throwing a mess of conditional compilation into an autoconf script.

ComputerGuru
0 replies
16h21m

That’s how you make unusable/uncompilable software. It might be a good rule for something security critical like ssh but not as a general rule.

pxx
3 replies
1d4h

What was even the game here? Eventually even more backdoors, ones that would have more plausible deniability? Afaict neither the oss-fuzz nor this change would actually discover the found backdoor.

But why put your backdoor eggs into one basket (library)?

dagmx
1 replies
19h15m

Who says it was just the one library though?

dylan604
0 replies
18h46m

waiting for a new bot to scan everyone's repos to find "." and then spam every repo with false positives

bayindirh
0 replies
1d3h

The library is entrenched enough, trusted enough, and its main developer has long internet breaks because of mental health problems.

Plus, you do not backdoor the library itself, but the tools using it. "Reflections on trusting trust" style.

Sounds like a perfect plan, until it isn't.

legobmw99
2 replies
1d3h

I’m not quite following what the diff here is suggesting - was this some cmake magic to detect if a feature was enabled, but the file had an intentional syntax error?

db48x
0 replies
18h59m

Autoconf and CMake both compile small test programs to verify that the feature in question actually works. The test programs almost never actually do anything; the just refer to the feature of function that they rely on. If there is a compile or linker error then the feature isn’t available. In this case the compiler always outputs an error because the test program doesn’t have valid C syntax.

Of course in practice you shouldn’t be writing each test program by hand. Autoconf has macros that generate them; you would only need to supply an identifier or at most a single line of code and the rest would be created correctly. I’m sure CMake is similar in that regard, so the first red flag is that they wrote a whole test program.

bean-weevil
0 replies
1d3h

That's exactly right. It was checking if the c code compiled to detect the landlock feature, and there was a single period in the middle of the code that made it always fail to compile and thus silently leave the feature disabled.

mjcohen
1 replies
58m

This may be naive, but it seems to me that a failed compile should generate an error worth worrying about.

kzrdude
0 replies
37m

It's in an autoconf style compile test. The compile or not of the snippet is being tested, unfortunately. This just decides the configuration, doesn't stop the build.

acqq
1 replies
9h40m

Now I'd like to have a link to that patch in code used by Apple that also appeared as a typo. Anybody remembers?

Cloudef
1 replies
7h12m

This is why I really don't like configure style build systems that automatically enable / disable features. When I want something I explicitly opt-in for it. If there's good reason for feature to be default, then instead explicitly allow opt-out.

kevincox
0 replies
3h13m

This is a huge pain when packaging things. You set up the package, add dependencies until it builds and think you are done. But they feature X is missing. What? Oh, it was silently disabled because libY isn't available. Ok, go back and add it. Then a user reports that feature Z isn't available...

Yeah, just have a default set of features and allow `--enable-a --disable-b` as needed.

The fact that it silently swallows bugs in the check is just another problem of it.

rossjudson
0 replies
18h1m

There I am, scanning carefully, and I see a period where one clearly should not be. "This wasn't so hard", I said to myself. I poked my screen, and the period moved. Curse you, monitor dust particle.

leni536
0 replies
10h22m

Optional features should not depend on feature detection, ever. Feature detection for a security feature should be suspicious, even if it works as intended.

Optional features should always be configured by whoever tries to compile the code. There can be defaults, but they shouldn't depend on the environment.

SillyHNDorks
0 replies
6h45m

Can you spot the single character that disabled Linux landlock?

No, and why should i?

JSDevOps
0 replies
1d3h

Yeah, I wasn't sure what the diff is alluding to here but I assume "mysandbox" could remain undetected (enabled/disabled) for whatever reason.

IshKebab
0 replies
11h9m

This is surprisingly obvious. I mean it's a clever technique to disable the feature and a really plausible commit. But then why did they go with an obvious syntax error instead of just misspelling something. E.g. would you have spotted it if the mistake was `PR_SET_NO_NEW_PRIV`?

More plausibly deniable too.