Answer: https://git.tukaani.org/?p=xz.git;a=commitdiff;h=f9cf4c05edd...
Description of Linux's Landlock access control system if you are not familiar with it: https://docs.kernel.org/userspace-api/landlock.html
xz official (maybe...) incident response page: https://tukaani.org/xz-backdoor/
So that function checked if the following C code compiled, and only in that situation enabled the landlock?
Except that lone period, hard to recognize because of its small size and proximity to the left edge of the diff, caused the C code to become always invalid, hence keeping the landlock always disabled?
That's both vilely impressive and impressively vile. I didn't even spot it on my first read-through.
I just got a little more respect for pythonic whitespace-sensitivity
EDIT: come to think of it, even that might not have done much here, where well-formedness is the issue :(
Yeah, if anything, python worsens the situation. I had a friend DOS our server because he accidentally inserted a tab, causing the illusion that one statement was inside a block but was actually outside it. He swore off python at that point. I personally avoid the language, but I understand due to issues like that these days mixing tabs and spaces is an error (or is it just a warning?) by default. Regardless, still pretty silly to me to have whitespace play such a major significant role, besides the fact that I find it visually harder to read, like text without punctuation or capitalisation.
white space as a delimiter is why i never use python.
Whitespace is a delimiter in (almost?) all languages humans use.
Whitespace determining which scope you’re in is one of the many problems of making whitespace significant, which might be what you meant.
Regarding your almost query. There was a debate over Ogham space mark in unicode. It is considered whitespace though, with the rationale that it is sometimes visible, but sometimes invisible. Depending upon whether the text has a stem-line. That doesn't make the set of non-whitespace delimited languages empty. Perhaps there is one with an always-visible delimiter that didn't get the whitespace justification, but does at least give one human language delimited by a printable character (which happens to be whitespace).
Well, there's also the opposite: Goldman Sachs's Slang allows space as part of identifiers.
Many SQL implementations permit whitespace in identifiers, but then you need to use quoted identifiers.
That's reasonably sane of SQL. In Slang, you don't need to quote. (The syntax is still unambiguous. In principle, eg Python could do something similar, because they don't have any existing syntax where you just put two identifiers next to each other with only a space in between. But C could not, because variable declaration is just two identifiers, one for the type and one for the variable name, next to each other with a space in between.)
In Slang, are “x y” with different number of spaces in the middle different identifiers or different spellings of the same identifier? SQL standard says different identifiers
The interaction with keywords would cause some issues. For example, right now, “if” is not a valid identifier (keyword), but “if_” and “_if” are. However, with this proposal “x y” could be a valid identifier, but “x if” would introduce ambiguity
This is one aspect of C syntax I have never liked. I always wish it had been Pascal-style `x:int;` instead of `int x;`
Sorry, I don't remember, and I can't seem to find it out online. I could ask my friends who still work there, if it's important. (For what it's worth, I never remember anyone complaining about mixing up a different number of spaces in their variable names. So either all number of spaces were treated the same, or perhaps multiple spaces in a row in an identifier were just banned (via linter or via language?).)
Yes, you would need to sort out these details, if you wanted to add this 'feature' to Python.
I'm glad Rust made the same decision.
I do like using the 'space operator' to denote functions calls, at least for a language like Haskell or OCaml.
“The thing that controls scope is the count of certain nonprinting characters, which happen to come in multiple widths” is reasonably insane, yes
Which non-printing characters are you talking about? Whitespace characters are very much printable.
Yes, I agree that Python should just forbid tabs. As a second best, you can tell your linter (and/or formatter) to forbid tabs (and any weird unicode whitespace). That's basically the equivalent of compiling your C with -Wall.
This was an issue in Python 2, where for some dumb reason it allowed mixing tabs and spaces and equated tab to 8 spaces (I think). Python 3 doesn't have that issue.
Human language is also way more ambiguous. One of the reasons I love coding is a massively reduced vocabulary and a way more strict grammer.
Mixing tabs and spaces usually throws a runtime exception. I'm not gonna make a value judgement about that, but your story doesn't make sense based on how I understand py3
Edit, sorry, shoulda read your whole commebt before replying
Yep. It was a few years ago while that was stilled allowed (as I'd noted ;) ) but regardless. Significant whitespace is just annoying.. There's a lot of things that render as whitespace, and source code one might be reviewing could be printed wrapped or copied and pasted in odd ways. Other languages are more robust to this.
Never had this happen, this is largely eliminated by using tools like black or other autoformatters.
The point still stands that this is an inherit possibility in the language it self.
Well, in C the visual indentation and the meaning of the code (given by {}) can diverge. That's even worse, and happens in practice. Many style guidelines for C have evolved specifically to address that problem with 'the language itself'.
I think its more of an tooling issue. If your editor / diff viewer cant differentiate between different whitespaces get a better one. Also if you want to ensure some function isnt in local scope etc just test it.
I feel that the main point still stands, though. Saying that Python doesn't have a whitespace problem because you can send the code through a tool that detects whitespace-related problems still acknowledges the existence of said problem.
And/or linters.
This issue is no worse that having multiple different variables that look the same and only in their obscure same-looking unicode symbols.
A linter can handle this. As can a decent font, or editor settings.
You can enable showing whitespace characters in your editor, in PyCharm they are visualised perfectly well as to not distract from the non-whitespace.
Like what? As you note, mixing tabs and spaces is now an error.
I've never understood the objection to semantic whitespace. Complaining about having to indent code correctly strikes me as being akin to complaining about having to pee in the toilet instead of the bathtub.
Yes mixing tabs and spaces is a big no no and rightfully throws an error now
humans are weird creatures sometimes. there was this bad thing that happened that won't happen again now, but now I can't use the thing forever because Reasons.
AST diffs instead of textual diffs might have helped here (to spot the `.` making the code un-compilable).
Edit: oof, though the stray character in question is inside a perfectly legitimate C string, so to catch this, any such diffs would need to Matroyshka down and that seems unsolvable / intractable.
Ah yes but thanks to C being cursed due to includes and macros this is harder to do
Huh, the code with a dot is not legal C. It is CMake issue that the test breaks here.
That’s what makes this so clever: these systems were born in the era where you couldn’t trust anything - compilers sometimes emitted buggy code, operating systems would claim to be Unix but had weird inconsistencies on everything from system calls to command line tool arguments, etc. - so they just try to compile a test script and if it fails assume that the feature isn’t supported. This is extremely easy to miss since the tool is working as it’s designed and since it’s running tons of stuff there’s a ton of noise to sort through to realize that something which failed was supposed to have worked.
You could just run tests for the feature detection on a known system or a dozen(VMs are cheap). The big problem is that most code is not tested at all or test errors are flat out ignored.
Nothing “just” about it. VMs weren’t cheap when the C world started using autoconf - that was a feature mostly known on IBM mainframes – and in any case that couldn’t do what you need. The goal is not to figure out what’s there on systems _you_ control, it’s to figure out what is actually available on the systems other users are building on. Think about just this library, for example, your dozen VMs wouldn’t even be enough to cover modern Linux much less other platforms but even if it was (drop most architectures and only support the latest stable versions for the sake of argument), you’d still need to build some reliable prober for a particular bug or odd configuration.
The problem here wasn’t the concept of feature detection but that it was sabotaged by someone trusted and easily missed in the noise of other work. What was needed was someone very carefully reviewing that commit or build output to notice that the lockdown feature was being disabled on systems where it was fully enabled, and we already know that maintainer time was in short supply. Any other approach would likely have failed because the attacker would have used the same plausible sounding language to explain why it needed to be a complicated dynamic check.
The goal is to figure out if the optional features work at all. For that a system you control is required.
That the sabotage worked at all relied on the fact that nobody was testing those features.
How secure is a sandbox that may not even exist? Apparently its good enough for most.
That's kind of the point. It's feature detection code. If the code cleanly compiles, the feature is assumed supported, otherwise, it's assumed not present/functional. This pretty common with autotools. The gotcha here is this innocuous period is not supposed to be syntactically valid. It's meant to be subtle and always disable the feature.
Shouldn't whatever is depending on xz supporting landlock be verifying that it's the case through blackbox tests or something? Otherwise a check like this even without the bug could end up disabling landlock if e.g. the compiler environment was such that a given header wasn't available...
Yes, this is intentionally how autoconf is supposed to work.
I think main issue was that it was embedded in the file itself like that. Would have preferred to have it in a separate valid C file with syntax highlighting etc and being parsed from that file.
Perhaps, but given how most build systems work, that would complicate things in other ways (since build systems often try to compile all .c files).
I am a CMake novice. Is that true for CMake in this example?
Definitely not, cmake's try_compile function even directly supports being passed a source file.
Put it in a special folder that is ignored from the build.
Not sure, you could also just forbid code that's too complex to analyse without going down a rabbit hole. Instead of trying to go down the rabbit hole.
In general, it's hard to analyse arbitrary code. But we don't have to allow arbitrarily complex code when doing code review.
In Python, you only need to indent the `def test_foo` by an additional whitespace, to make it a locally scoped function instead of a proper test function.
No, not if you have any sane linter or formatter involved. They wouldn't let you get away with indenting by a single space, but only by multiples of whatever you normally use in the rest of your program.
I mean, some CI system should be checking that "if_code_compiles()" blocks compile somewhere. It should be an error until the CI system has that header and can test both variants.
People are really quick to add optionality like this without understanding the maintenance cost. (Every boolean feature flag increases the number of variants you need to test by 2!) Either make a decision, or check that both sides work. Don't let people check in dead code.
multiplies it by 2, not increases
yup, meant to say "a factor of 2".
Even more evil would have been to replace this line
with this:For those squinting, the "landlock" regular "c" is replaced with a Cyrillic U+0441.
Is there a GCC option to error on non-standard English characters?
Error-ing is the point here and what the period achieved. It’s a feature detection snippet so if it fails to compile the feature is disabled.
It seems like there should be a way to catch these types of “bugs” - some form of dynamic analysis tool that extracts the feature detection code snippets and tries to compile them; if they fail for something like a syntax error, flag it as a broken check.
Expanding macros on different OSes could complicate things though, and determining what flags to build the feature check code with — so perhaps filtering based on the type of error would be best done as part of the build system functionality for doing the feature checking.
A syntax error might be exactly what they’re looking for e.g. they’re feature testing a new bit of syntax or a compiler extension.
Which would require every compiler to have detailed, consistent, and machine-readable failure reporting.
At least for newer C++ standards it seems like there is decent support for feature test macros, which could reduce the need for a feature check involving compiling a snippet of test code to decide if a feature is available: https://en.cppreference.com/w/cpp/feature_test
Handling the output from several of the most recent GCC and Clang versions would probably cover the majority of cases, and add in MSVC for Windows. If the output isn’t from a recognized compiler or doesn’t match expectations, the only option is falling back to current behavior. Not ideal, but better than the current status quo…
That would only work for projects that care only about current compilers, where C in general has more desire to support niche compilers.
A mitigation here would be to make result of autoconf only provide instructions for humans to change their build config, instead of doing it for them silently. The latter is an anti-pattern.
FWIW, the approach you propose is how the UI tests for rustc work, with checks for specific annotations on specific lines, but those have the benefit of being tied to a single implementation/version and modified in tandem with the app. Unless all compilers could be made to provide reasonable machine readable output for errors, doing that for this use case isn't workable.
Universal support of SARIF by compilers would be most of that.
I'd prefer if the ecosystem standardized on some dependency management primitives so critical projects aren't expected to invent buggy and insecure hacks ("does this strong parse?") in order to accurately add dependencies.
It would be interesting to see what the most common compile feature checks are for, and see what alternative ways could be used to make the same information available to a build system — it seems like any solution that requires libraries being updated to “export” information on the features they provide would have difficulties getting adoption (and not be backwards compatible with older versions of desired dependencies).
From my experience looking at rust builds, Pareto applies here: most checks are trivial and used by a lot, and a handful are super complex and niche.
Probably the code review tools should be hardened as well, to indicate if extended identifiers had been introduced to a line where there wasn't any. That would help catching the replacement of a 'c' character with a Russian one.
Btw, the -fno-extended-identifiers compiler parameter gives an error if UTF-8 identifiers are used in the code: <source>:3:11: error: stray '\317' in program float <U+03C9><U+2083> = 0.5f;
Maybe in the future more languages/ tools will have the concept of per-project character sets, as opposed to trying to wrangle all possible Unicode ambiguity problems.
I suppose then the problem is how to permit exceptions when integrating with some library written in another (human) language.
Disclosure: I've got zero C/C++ on my resume. I was asked to diagnose a kernel panic and backport kernel security patches once, but it was very uncomfortable. ("Hey, Terr knows the build system, that's close enough, right?")
That said, perhaps something like disabling the default -fextended-identifiers [0], and enabling the -Wbidi-chars [1] warning.
[0] https://gcc.gnu.org/onlinedocs/gcc/Preprocessor-Options.html...
[1] https://gcc.gnu.org/onlinedocs/gcc/Warning-Options.html#inde...
Cool, the latter was added to fix CVE-2021-42574: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103026
Not yet. I was working on the standardization for C23, but this was postponed to C26, and has not much support. MSVC and SDCC liked it, CLANG and GC not so much.
Here is my bad variant of the feature, via confusables: https://github.com/rurban/gcc/tree/homoglyph-pr103027
The better variant would be to use my libu8ident, following UTR 39. I only did that for binutils.
Well - that would be exactly the point of the attacker. GCC errors out, and it does not matter whether this is because the intentionally typoed header does not exist or because non-English characters are not allowed. Errors encountered during compilation of test programs go to /dev/null anyway, as they are expected not to compile successfully on some systems - that's exactly the point of the test. So no, this option would not have helped.
I mean, GCC erroring is how the exploit works here. cmake tries to compile the source code: if it works then the feature is available, if it fails then the feature is not available. Forcing the build to fail is what the attacker wants.
The -fno-extended-identifiers option seems to do something in this area, but I don't know if it is sufficient. But it may block some characters which are used in standard English (for some values of standard).
Yip, this was impressive. Only a copy & paste into VSCode (Windows) revealed the rectangle around the "c" in the second line.
Impressive.
compiler would perhaps "see" it ?
Sure; but so long as the compiler error is silently discarded (as it is here), the configure script will assume landlock isn't available and never use it.
A misplaced punctuation has some plausible deniability. Like the author could say he was distracted and fat fingered nonsense, honest mistake.
A utf character from a language that has zero chance of being mapped to your programmer's keyboard in the middle of a code line, that would be obvious intentional tampering or at the very least raise some eyebrows.
I accidentally get Cyrillic "с" in my code few times a year. It's on the same key as Latin "c", so if I switch keyboard layout, I don't notice until the compiler warns me. Easy to do if I switch between chats and coding. Now my habit is to always type a few characters in addition to "c" to check what layout I'm _really_ using.
Granted, it's easier with a one-letter variable called "c", but with longer names I can easily see myself not typing "c" the first time (e.g. not pressing hard enough), starting build, chatting in some windows, getting back, facepalming, "fixing" the error by adding "c" or "с" depending on my keyboard layout.
even worse, I have Punto switcher that automatically switches language when I start typing. With default config, it changes latin c to russian one because russian language includes word "c" while it's non-sense in English.
and since it's the only Cyrillic character that's placed on the same key as the same-looking english character, I don't even see the problem with my eyes when the autoreplacement fires
I mean, I agree that the punctuation mishap is a better cover story, but why would any particular language have “zero chance” of being mapped to the programmer’s keyboard?
You left out a very important qualifier: in the middle of a code line
The chances of someone accidentally fat-fingering a keyboard layout switch, typing one character, then switching back again without noticing while typing a line of code are very slight indeed.
Plus the fact that you'd need to speak a language written in that alphabet to have a plausible reason for having the keyboard layout enabled in the first place.
With the . the author can claim it was unintentional. It's impossible (or at least very hard) to claim the Cyrillic c is unintentional.
To me, that makes the . more evil.
Well, on Russian keyboards, they are on the same key.
Good point - it might be plausible if the author actually were Russian, but they aren't.
Plausible deniability
Putting random Unicode confusables in source code would be far easier to consider malicious
Except many people have a policy of ASCII only in source code and therefore would catch it immediately.
After the concept of such attacks had blow up idk. 1?2?3? years ago a lot of places now warn at least when mixing "languages"/ranges or sometimes on any usage of non us-ascii printable characters.
So stuff like that is increasingly more reliable caught.
Including by 3rd parties doing any ad-hoc scanning.
Through adding `compiles check` fail with syntax errors definitely should be added tot he list of auto scan checks, at least in C/C++ (it's not something you would do in most languages tbh. even in C it seems to be an antipatters).
Nearly every modern IDE and diff viewer instantly highlights this though? I doubt this would get far.
It's also iffy tbh. that the compilation check functionality:
- doesn't force users to differentiate between syntax errors and other errors (e.g. symbols not found). Partially you can't even make it work properly if you want due to C macros.
- it seems sensible, tbh. if "does compile" checks for compatibility seems sensible then there is so much wrong with the ecosystem in so many ways
It's standard autoconf stuff, BUT... the right thing to do for a security option should be to throw an error at the configure stage If the Option is requested but can't be supported, And not to silently turn it off.
That is because a human should have to Look at the build and manually make the decision to switch off a security feature from the build.
Autocomf output is so incredibly noisy though, still long odds someone would notice anytime soon.
If you break the build they’ll notice
Good thing they disabled the check that would have broken the build/tests in this case eh?
I agree. The way I read this, I missed it completely until I searched for it. I was looking too closely at the rest of the code around defines
It also shows how brittle these sorts of checks are in general. I really think that this auto-disable of features based on compilation was a big mistake. It is not only very annoying when packaging things (Why does feature X not work? Because you forgot to include libY in the build environment of course) but they also fail silently even if intended. While this `.` was almost certainly malicious it is easy to imagine an accidental break. Maybe a minor typo or `-Werror`is enabled any one of these functions was marked as must use or something similar.
It seems like it is infinitely better to have a default set of features and explicitly require the user to say `--enable-x --disable-y` as needed. Then error out if a required library isn't available. (Preferably saying which feature requires it.)
This has plausible deniability on it.
There's better ways to hide by swapping in Unicode lookalike characters. Some of them even pixel match depending on the font.
Maybe I'm out of the loop but is intentionality settled here?
Unicode lookalikes would be detected by IDEs and other tools.
There would be plausible deniability in a different situation, but this is the same author who implemented the backdoor and several similar changes that disable security features. I don't think the benefit of doubt is deserved here.
Ok there's a larger question here about the bazaar software development method.
How can we ensure say, that Microsoft doesn't pay someone to throw a wrench in libre office development or Adobe to sabotage Gimp?
There's lots of deception strategies for bad faith actors and given the paucity of people who actually do the work, it's really really hard to be picky.
Especially with the complexity of library dependencies. Defect-at-a-distance may actually be a common strategy and this is the first that was caught
Microsoft and Adobe have reputations to uphold long into the future.
Is that infallible? Hell no it isn't, but consider that Jia Tan only needed to uphold his reputation insofar as getting his backdoor onto everyone's systems. Once that is done, his reputation or the lack thereof becomes a non-issue because his work is done. We're lucky we caught him just before the finish line.
The likelihood of demonstrably reputable parties like Microsoft and Adobe poisoning the well is practically nil because they don't have a finish line.
Those companies are famous for skullduggery.
They can secure the dominance of their offering against the open source competition for well under 500k a year. It's a no brainer.
What this might look like would be say, poorly discernable icons, clumsy UI design, or an unstable API that makes plugins constantly break. Large volumes of documentation that are inadequate or inaccurate in critical places, etc.
If I was malicious I'd pay someone to write otherwise functional code and bug fixes but make really user hostile decisions whenever possible.
We should be diligent for this in a few key projects. Tech companies could easily be doing this already.
1. Identify particularly unproductive employees. The people who relly muck things up.
2. Make them contribute to a competing open source project.
3. Profit!
3. Discover the employee was not unproductive because he was bad but because of internal bureaucracy
It would be hard for them to not get caught with their hands in the cookie jar - corp employees have high turnover, and low loyalty past their employment.
Even paying a cutout consulting company to do it would be iffy since so many finance employees would be involved in paying them, it would leak sooner than later - or at least raise a lot of questions that would be hard to answer legitimately. Being a public company, also hard to avoid auditor scrutiny.
Even a private company would have an easier time.
Nation state actors though? No such issues - that’s literally why security clearances, compartmentalization, and ‘you never really leave’ are done the way they are.
And they don’t have to worry about market blowback unless they get really excessively heavy handed (like France did in the late 80’s-90’s). They are setup to do shady stuff, it’s their raison d'etre.
[https://en.m.wikipedia.org/wiki/Industrial_espionage#:~:text....]
There are levels of skull duggery. Hiring someone to pretend to work for a competitor while secretly sabotaging them is a whole other level of skullduggery with a lot of liability attached. I don't think that would be worth it to them.
Well since you seemingly want to paint Microsoft and other such companies in a bad light, let me point out to you that it's actually Microsoft who brought awareness to this very problem: Andres Freund works at Microsoft.[1][2]
It is probably prudent for you (and other like-minded individuals) to be more careful who you think your enemies really are. Infighting against friends and allies, or even neutral parties, is exactly what your real enemies would want you to do.
[1]: https://www.linkedin.com/in/andres-freund
[2]: https://twitter.com/AndresFreundTec
You say all this after recent documents were revealed about Facebook intercepting and analyzing Snapchat encrypted traffic via a man-in-the-middle conspiracy.
What kind of a reputation do you think Facebook has?
I'm not sure that an IDE will catch a syntax error in C code quoted inside a cmake script trying to test if things compile.
A lot of IDE configurations, such as VSCode on the default config, highlight Unicode characters that are invisible/look like others (eg. cyrillic characters or non-breaking space) or even all non-ASCII characters, regardless of whether they're in a string or not.
Try pasting any of these into your IDE and see if it catches it. https://gist.github.com/StevenACoffman/a5f6f682d94e38ed80418...
In either case, it would appear as added code, not different characters in the same code, so maybe about the same. If someone changed one character in an existing string though, I think it would be more likely caught visually by someone used to seeing accidental garbage show up
the period right there on the left edge? if I saw that in a patch I'd be through the roof, that looks completely intentional
I love this type of hindsight to 20-10 comment. "If I saw...". That is a BIG if. Plenty of smart people on that mailing also missed it. I missed it myself when I opened the HN lead link. Very subtle.
I dont use mailing lists for patches, I use gerrit that would put a great big highlight over that character
Double spacebar gives you period in some contexts on Mac.
Not me. Maybe someone is using an editor with a . key mapped to something. It's in a pretty convenient place.
:wq!
I'd never suspect this to be intentional if I'd spot it in a patch, even given the consequences in this particular case. I have written and committed things into code instead of writing it into some other window several times. Without a linter I probably wouldn't spot an extra dot when reviewing my own change before sending it out.
I distinctly remember having to remove such a superfluous . that I accidentally added into a file on multiple occasions.
If you are using vi and your left shift key is dodgy, that can easily happen by accident.
Yes. The exploit used a payload, that was stored in the tests directory as a binary compression test case, but which is very clearly a very intentional exploit payload.
All the sneaky stuff was about deploying that payload without it being obvious.
What does the dot do?
The function is “check_c_source_compiles”. The comment indicates that the intention is to confirm that the Landlock functionality can be compiled on the system, in which case it will be enabled.
The stray dot isn’t valid C, so it will never compile. By ensuring it can never compile, Landlock will never be enabled.
configure would print that it's not enabled, so it seems like the kind of thing people would eventually notice.
Maybe, eventually, but how many people read the reams of garbage autoconf spouts out until the feature they wanted fails to materialize?
The actual compiler output from the autoconf feature test is one of the things I'd probably look at fairly early on if some feature is disabled when it shouldn't be, but I maybe have a bit more experience running into problems with this than younger folks.
That's my point, you're going to look at autoconf's output if a feature you're expecting is missing.
But would you think to have a test or expectation for landlock in xz? How would you even check if landlock is enabled for a process?
I used to and it's actually how I got a large part of my computer skills, but unfortunately I got medicated for ADHD and became normal and can't do it anymore.
But I still think reading boring logs is a great way to understand a system. Turning on all the intermediates for a compiler (llvm ghc etc) is very educational too.
Looks like he only introduced that "bug" in CMake. Was there migration in progress from autoconf to cmake?
Shouldn't there be a unit test to confirm landlock is on/off? (I mean, this seems a crucial aspect of the code which needs 100% test coverage.)
This is not something that a unit test can catch. First, this 100% coverage rule applies to the program/library code, and only to the subset that is not ifdeffed out (otherwise, you will not be able to have, e.g., Windows-specific code), and definitely not to the build system's code. Second, how would you test that landlock works, in a unit test, when this feature is optional and depends on the system headers being recent enough? You can't fail a unit test just because the software is being compiled on an old but still supported system, so it would be a "SKIPPED" result at best, which is not a failure and which is not normally caught.
Fails the build, so that Landlock support is never enabled.
I don't think it fails the build. It's part of a test, trying to compile a bit of code. If it compiles the test is true and a certain feature is enabled. If the compilation fails the the feature is disabled.
This is quite common in build systems. I had such a test produce incorrect results in the kernel build system recently. The problem is that the tests should really look carefully for an expected error message. If the compilation fails with the expected message the test result is false. If the compilation fails for some other reason the build should fail so the developer is forced to investigate.
Disclaimer: I have not studied the xz build system in detail, just what I saw from the diff plus a bit of extrapolation.
Problem is that every compiler/compiler version will have different messages for the same error. And xz is the kind of project that gets built on really really random homegrown compilers. Maybe we need an option to make the compilers output a standard JSON or whatever...
While I like the idea, I fear this would go the same route as UserAgent strings, with everybody giving carefully crafted lies. The main issue is that bugs are only known in retrospect.
As an example, suppose Compiler A and Compiler B both implement Feature X. Therefore, they both produce JSON output indicating that they support Feature X. A year later, developers for Compiler A find and fix a bug in its implementation of Feature X. Now, how do we handle that case?
* Update the version for Feature X as by Compiler A? Now Compiler B is publishing the same version as the buggy Compiler A did, and would get marked as not having a usable implementation.
* Update the spec, forcing a version update for Feature X in both Compiler A and Compiler B? Now a bugfix in one compiler requires code updates in others.
* In the build system, override the JSON output from Compiler A, such that Feature X is marked as unavailable for the buggy versions? Now the JSON is no longer trustable, as it requires maintaining a list of exceptions.
* Try to trigger the bug that Compiler A just fixed, so that the build system can recognize the buggy versions, regardless of the JSON output? Now we're back to the starting point, with features determined based on attempted compilation.
It fails that small build that is testing for landlock functionality. Hence it doesn’t build the support for it.
It doesn’t fail the overall build.
It feels like this is a poor way to determine if a feature should be enabled. There are just too many reasons it could fail, and default off on compilation failure just seems like a glaring security bug. If this is a common practice, it seems like the community should rethink this choice.
It seems like Lasse Collin is back on the scene and maybe pissed?
If he is innocent and a victim as much as everyone else in all this, I won't blame him for wanting blood.
Most of us are humans after all, and being social creatures we tend to take violations of trust quite deeply.
Truly seems that way currently. He said he'd really dig in starting next week and just checked his email on vacation and saw this whole mess.
It may be hard for him to re-establish trust. Maintaining xz for more than a decade then doing this would be quite a "long con" but if HN threads are any indication, many will still be suspicious.
His commits on these links look legit to me. It's a sad situation for him if he wasn't involved.
The fact GitHub suspended his account too suggests that they might have info saying he is involved.
I think it was somewhat irresponsible to block everything. It hampers instigation of the repo's history. It's good that Lasse had another mirror.
Woops, too late to edit this comment and say *investigation
Personally, I doubt that. I would assume that GitHub just banned all accounts that were directly associated with the project as a precaution.
Or you know, they just reacted with a ban hammer on all accounts related to xz and to heck with innocence.
Honestly, he should call it quits and just drop xz utils and move on with life. He maintained it for 15 years and struggled to get anyone else to help until Jia showed up. Meanwhile the Linux ecosystem depended on it.
The Linux ecosystem will either figure shit out and maintain it or move into a different archive library.
I worry about his home/personal system, this guy seems to have befriended Lasse in order to get the position, is it possible Lasse himself has been compromised?
I'd be nuking everything, possibly even disposing of hardware and setting up a new verifiable online identity on a clean system/hardware.
I feel like my version control system is better about highlighting the changed characters than these solid green or red strings.
I don't love unified diffs as a rule either. They're very noisy to read in general.
A side by side in something like "meld" will highlight changes but also means you're reading the full context as it will exist in the code base.
My number one complaint about "code review" is the number of people who simply run their eyes over the diff (in private orgs) because management says "code review" and that's as far as they ever take that in terms of defining outcomes.
I love unified diffs for many applications. You are right that side by side diffs have their uses, too.
It's git, so it can show a vastly better diff, just not from a URL with hardcoded diff settings.
Yeah, but I’m wondering where the code review was done. In a different context, would this be easier to spot?
It’s a giant block of new code, what is it supposed to do beyond tell you it’s a giant block of new code?
Note that this is C code inside if a cmake string, even if your diff can do highlighting the odds it would highlight that are low, and if it did highlighting is generally just lexing so there wouldn’t be much to show.
This is so vile that even if caught red-handed during PR one could shrug off "oh, my IDE's auto formatting did this".