return to table of content

Rust for Filesystems

ysw0145
39 replies
1d7h

Having more options available in the Linux kernel is always beneficial. However, Rust may not be the solution for everything. While Rust does its best to ensure its programming model is safe, it is still a limited model. Memory issues? Use Rust! Concurrency problems? Switch to Rust! But you can't do everything that C does without using unsafe blocks. Rust can offer a fresh perspective to these problems, but it's not a complete solution.

bilekas
15 replies
1d7h

Concurrency problems?

I have to admit, while I do enjoy rust in the sense that it makes sense and can really "click" sometimes. For anything asynchronous I find it really rough around the edges. It's not intuitive what's happening under the hood.

asyx
7 replies
1d6h

I really hate async rust. It's really great that rust forces you on a compiler level to use mutexes but async is a disease that is spreading through your whole project and introduces a lot of complexity that I don't feel in C#, Python or JS/TS.

John23832
6 replies
1d6h

Eh, syntactically async rust is the exact same as C#. It's all task based concurrency.

Now, lifetimes attached to function signatures is definitely a problem.

colejohnson66
5 replies
1d5h

Not really. C#'s Task/Task<T> are based on background execution. Once something is awaited, control is returned to the caller. OTOH, Rust's Future<T> is, by default, based on polling/stepping, a bit like IEnumerable<T> in C#; If you never poll/await the Future<T>, it never executes. Executor libraries like Tokio allow running futures in the background, but that's not built-in.

John23832
3 replies
1d4h

I don't want to "well actually" the "well actually", but I think you missed the word syntactically.

C#'s Task/Task<T> are based on background execution. Once something is awaited, control is returned to the caller.

Async/await in any language happens in the background.

What happens during a Task.Yield() (C#)? The task is yielded to the another awaiting task in the work queue. Same as Rust.

OTOH, Rust's Future<T> is, by default, based on polling/stepping,

The await syntax abstracts over Future/Stream polling. The real difference is that Rust introduced the Future type/concept of polling at all (which is a result of not having a standard async runtime). There is a concept of "is this task available to proceed on" in C# too, it's just not exposed to the user and handled by the CLR.

merb
2 replies
22h47m

Task.Yield()

In c# you probably never call yield.

neonsunset
0 replies
21h28m

Yield in C# is frequently used for the same reasons as in Rust, although implementation details between fine-grained C# Tasks and even finer grained Rust Futures aggregated into large Tasks differ quite a bit.

Synchronous part of an async method in C# will run "inline". This means that should there be a computationally expensive or blocking code, a caller will not be able to proceed even if it doesn't await it immediately. For example:

    var ptask = Primes.Calculate(n); // returns Task<ulong[]>
    // Do other things...right?
    // Why are we stuck calculating the primes then?
    Console.WriteLine("Started.");
In order for the .Calculate to be able to continue execution "elsewhere" in a free worker thread, it would have to yield.

If a caller does not control .Calculate, the most common (and, sadly, frequently abused) solution is to simply do

    var task = Task.Run(Primes.Calculate);
    // Do something else
    var text = string.Join(',', await task);
If a return signature of a delegate is also Task, the return type will be flattened - just a Task<T>, but nonetheless the returned task will be a proxy that will complete once the original task completes. This successfully deals with badly behaved code.

However, a better solution is to instead insert `Task.Yield()` to allow the caller to proceed and not be blocked, before continuing a long-running operation:

    var ptask = Primes.Calculate(n); // returns Task<ulong[]>
    // Successfully prints the message
    Console.WriteLine("Started.");


    static async Task<int[]> CalculatePrimes(int n)
    {
        await Task.Yield();
        // Continue execution in a free worker thread
        // If the caller immediately awaits us, most likely
        // the caller's thread will end up doing so, as the
        // continuation will be scheduled in the local queue,
        // so it is unlikely for the work item to be stolen this
        // quickly by another worker thread.
    }

John23832
0 replies
21h51m

It was just an example. In practice, you're right.

brigadier132
0 replies
1d2h

How do you imagine async works otherwise? Also, in case you misunderstand how polling works in practice in rust, it's not polling in the traditional web development sense where it polls every 5 ms to check if a future is completed (although you can do this if you want to for some reason). There are typically "wakers" that are "awoken" by the os when data is ready and when they are "awoken" then they poll. And since they are only awoken by the OS when the information is ready it really never has to poll more than once unless there are multiple bundled futures.

the_duke
5 replies
1d6h

Async != concurrency.

One of the major wins of Rust is encoding thread safety in the type system with the `Send` and `Sync` traits.

bilekas
2 replies
1d6h

Async != concurrency.

Right, but tasks are sharing the same thread which is fine, but when we need to expand on that with them actually working async, i.e non blocking, fire and quasi-forget, its tricky. That's all I'm saying.

the_duke
1 replies
1d4h

The Rust async experience indeed has lots of pitfalls, very much agree there.

dboreham
0 replies
1d4h

s/The Rust/All/

duped
1 replies
1d3h

async == concurrency, concurrency != parallelism.

sophacles
0 replies
22h37m

async == concurrency in the same way square == rectangle - that is it's not an associative '==' since there are plenty of rectangles that are not squares.

wongarsu
0 replies
1d4h

Rust async isn't all that pleasant to use. On the other hand for normal threaded concurrency Rust is one of the best languages around. The type system prevents a lot of concurrency bugs. "Effortless concurrency" is a tagline the language really has earned.

drdo
10 replies
1d7h

But unsafe blocks are available! And you should use them when you have to, but only when you have to.

Using an unsafe block with a very limited blast radius doesn't negate all the guarantees you get in all the rest of your code.

sanxiyn
8 replies
1d7h

Note that unsafe blocks don't have limited blast radius. Blast that can be caused by a single incorrect unsafe block is unlimited, at least in theory. (In practice there could be correlation of amount of incorrectness to effect, but same also could be said about C undefined behavior.)

Unsafe blocks limit amount you need to get correct, but you need to get all of them correct. It is not a blast limiter.

weinzierl
5 replies
1d7h

Yes, they don't contain the blast, but they limit the places where a bomb can be, and that is their worth.

foldr
4 replies
1d2h

Generally speaking yes, but there could be a logic error somewhere in safe code that causes an unsafe block to do something it shouldn’t. For example, a safe function that is expected to return an integer less than n is called within an unsafe block to obtain an index, but the return value isn’t actually less than n. In that case the ‘bomb’ may be in the unsafe block, but the bug is in the safe code.

nicce
1 replies
1d1h

yes, but there could be a logic error somewhere in safe code that causes an unsafe block to do something it shouldn’t.

Sounds like bad design. You can typically limit the use for unsafe for so small area than you can verify the ranges of parameters which will cause memory problems. Check for invalid values and raise panic. Still ”memorysafe”, even if it panics.

foldr
0 replies
23h36m

Sure, it may be bad design. The point is that nothing in the Rust language itself guarantees that memory safety bugs will be localized to unsafe blocks. If your code has that property it’s because you wrote it in a disciplined way, not because Rust forced you to write it that way (though it may have given some moral support).

Let me emphasize that I am not criticizing Rust here. I am just pointing out an incontrovertible fact about how unsafe blocks in Rust work: memory safety bugs are not guaranteed to be localized to unsafe blocks.

Klonoar
1 replies
1d1h

I cannot imagine writing a method to return a value less than n, and not verifying that constraint somewhere in the safe method.

foldr
0 replies
23h36m

It’s just a simple example to illustrate the point. Realistic bugs would probably involve more complex logic.

The prevalence of buffer overrun bugs in C code shows that it very definitely is possible for programmers to screw up when calculating indices. Rust removes a lot of the footguns that make that both easy to do and dangerous in C. But in unsafe Rust code, you’re still fundamentally vulnerable to any arithmetic bug in any function that you call as part of the computation of an index.

neysofu
0 replies
1d6h

I believe this is technically true, but somewhat myopic when it comes to how maintainers approach unsafe blocks in Rust.

UBs have unlimited blast radius by definition, and you'll need to write correct code in all your unsafe blocks to ensure your application is 100% memory-safe. There's no debate around that. From this perspective, there's no difference between a C application and a Rust one which contains a single, incorrect unsafe block.

The appreciable difference between the two, however, is how much more debuggable and auditable an unsafe block is. There's usually not that many of them, and they're easily greppable. Those (hopefully) very few lines of code in your entire application benefit from a level of attention and scrutiny that teams can hardly afford for entire C codebases.

EDIT: hardy -> hardly (typo)

drdo
0 replies
1d5h

That is of course correct.

The main value is that you only have to make sure that a small amount of code surrounding the unsafe block is safe, and hopefully you provide a safe API for the rest of the code to use.

CraigJPerry
0 replies
1d7h

I’d word that different- it reduces the search space for a bug when something goes wrong but it doesn’t limit the blast radius - you can still spectacularly blow up safe rust code with an unsafe block (that no aliases rule is seriously tough to adhere to!)

This is definitely a strong benefit though.

tialaramex
8 replies
1d6h

But you can't do everything that C does without using unsafe blocks

For this particular work the huge benefit of Rust is its enthusiasm for encapsulating such safety problems in types. Which is indeed what this article is about.

C and particularly the way C is used in the kernel makes it everybody's responsibility to have total knowledge of the tacit rules. That cannot scale. A room full of kernel developers didn't entirely agree on the rules for a data structure they all use!

Rust is very good at making you aware of rules you need to know, and making it not your problem when it can be somebody else's problem to ensure rules are followed. Sometimes the result will be less optimal, but even in the Linux kernel sub-optimal is often the right default and we can provide an (unsafe) escape hatch for people who can afford to learn six more weird rules to maybe get better performance.

mjburgess
7 replies
1d6h

That cannot scale.

lol... you're talking about the linux kernel, written in C.

The vast majority of software over many decades "bottoms out" in C whether in VMs, operating systems, device drivers, etc.

The scale of the success of C is unparalleled.

pjc50
4 replies
1d5h

The scale of C adoption is certainly unparalleled over the past 40 or so years, but so are the safety issues in the cyberwarfare era.

https://www.whitehouse.gov/oncd/briefing-room/2024/02/26/pre...

If, somehow, we'd got to an era where (a) operating systems were widely deployed in a different language, and (b) the Morris Worm of 1988 had happened due to buffer overflow issues, then C in its current form would never have been adopted.

mjburgess
3 replies
1d5h

C is just convenient assembly. In an era where performance mattered, and much software was written for hardware, and controlling hardware, it's hard to see an alternative.

C's choices were for performance on hardware-limited systems. I don't really see what other ones made sense historically.

pjc50
0 replies
1d4h

C is, in some important cases, less convenient than assembly in ways which have to be worked round either fooling the compiler or adding intrinsics. A recent example: https://justine.lol/endian.html

Is the huge macro more convenient than the "bswap" instruction? No, but it's portable.

I don't really see what other ones made sense historically.

Pascal chose differently in a couple of places. In particular, carrying the length with strings.

C refused to define semantics for arithmetic. This gave you programs which were "portable" so long as you didn't mind different behavior on different platforms. Good for adoption, bad for sanity. It was only relatively recently they defined subtraction to be twos-complement.

16-bit Windows even used C with the Pascal calling convention. http://www.c-jump.com/CIS77/ASM/Procedures/P77_0070_pascal_s...

kelnos
0 replies
20h34m

C is just convenient assembly.

I'm not sure if you're being facetious here, but that's absurd. It is certainly one of our lowest-level options before reaching for assembly, but it's still a high-level language that abstracts machine details from the programmer.

In an era where performance mattered, and much software was written for hardware, and controlling hardware, it's hard to see an alternative.

During that era, people who really needed to care about performance used assembly. The optimizations done by C compilers at that time were not nothing, but they were fairly primitive to what they do now.

another2another
0 replies
1d4h

In an era where performance mattered, and much software was written for hardware, and controlling hardware, it's hard to see an alternative

Actually, what made sense _was_ assembly when performance mattered above all. C was actually seen as a higher level language.

However C's advantage was the fact that it was cross platform, so you could compile or quite easily port the same code to many different platforms with a C compiler (Solaris,Windows,BSD,Linux and latterly Mac OSX). That was its strength (pascal shared this too, but it didn't survive).

You can see this in the legacy of software that's still in use today - lots of gnu utilities, shells, X windows, the zlib library, the gcc, openssl and discussed fairly recently POV Ray which has been going since the 80's.

freeone3000
0 replies
1d5h

But it doesn’t have to. We can choose any other language that compiles to native, including memory-safe ones.

dxroshan
0 replies
1d6h

I agree with you.

pjc50
1 replies
1d6h

But you can't do everything that C does without using unsafe blocks

How much of this is actually 100% unambiguously necessary? Is there a good reason why anything in the filesystem code at all needs to be unsafe?

I suspect it's a very small subset needed in a few places.

nicce
0 replies
1d1h

Usually avoidance of copying or moving data is the primary reason. In filesystems, this is quite highlighted.

bigstrat2003
0 replies
1d3h

But you can't do everything that C does without using unsafe blocks. Rust can offer a fresh perspective to these problems, but it's not a complete solution.

It's true that you need to have unsafe code to do low level things. But it's a misconception that if you have to use unsafe then Rust isn't a good fit. The point of the safe/unsafe dichotomy in Rust is to clearly mark which bits of the code are unsafe, so that you can focus all your attention on auditing those small pieces and have confidence that everything else will work if you get those bits right.

gritzko
39 replies
1d7h

From the minutes I conclude that Rust-in-the-kernel looks like an additional complexity tax. I mean, if you write an OS from scratch, you can use the full power of your language. Plastering it to the side of an already vast codebase creates additional issues, as we see here.

mnau
25 replies
1d7h

additional complexity tax

Yes, but that should be offsetted by easier driver development. See the blog about Rust GPU driver for asahii linux, done in one month. EDIT: Google "tales of the m1 gpu" (author has a very negative opinions about hacker news, read if you like by clicking the link https://asahilinux.org/2022/11/tales-of-the-m1-gpu/)

Is it universal? We'll see in coming years.

eru
20 replies
1d7h

Alas, that link just gets you a rant about politics, if you click on it directly.

Copy-and-pasting works.

olivermuty
9 replies
1d6h

"Rant about politics", haha. Or as other people like to call it: "A real concern described in an apt manner".

I have observed these inflammatory sub-graphs of comments myself and have thought to myself that this must be a huge growing grounds for unmoderated and unwanted behaviour because it more or less becomes invisible once flagged enough.

erdii
7 replies
1d6h

In this specific case complaining about "politics" gets the sour by-taste of enabling (or at least not condoning) harrasment to the point of single folks taking their own lifes over it.

Why?! Even if you're not sure what to think about the queer movement; even if you have already made up your mind about the queer movement and oppose their ideas or some of them; I refuse to believe that any single person would not want to stop someone from bullying someone else into their own suicide!

hastily jotted rant for the folks who'd like to complain about "politics" from creeping into every discussion everywhere:

It's really sad to see so many folks disconnecting and immediately dismissing whole groups of other folks as soon as they start complaining about an issue they have because of "politics". :(

I get that you don't want to get involved in shit flinging shows and that its tedious to figure out who's in the right and who's in the wrong. Especially because there are never clear answers. If you feel like this and then proceed to complain about 'politics' creeping everywhere, please beware of this:

Pretending to be apolitical doesn't work most of the time, as politics is basically another word for "acting (or deliberately not-acting) in some kind of public sphere" which you all do, and when the "policitics" have arrived at a topic, then they'll stay there at least in that specific case you are witnessing! You just are part of a hyperconnected and confusing world with a lot of conflict, wether you like it or not.

Pretending to be apolitical also serves the upholding of whatever status quo is currently in place because anything that has even a slight chance of changing anything is inherently a political topic.

Please don't turn your heads on "political" topics or, at least, don't complain about it in that way as it mostly enables unjust behaviour to continue. It doesn't even matter if it's the person who brought up the "political" stuff who is acting unjust or the folks they're complaining about). In both cases it's probably better to either avoid commenting at all or to convey your critical thoughts to that "political" conversation.

pjc50
4 replies
1d6h

I refuse to believe that any single person would not want to stop someone from bullying someone else into their own suicide!

There are plenty of people who do want the freedom to say exactly what they choose, including a lengthy period of directed harassment, and shrug their shoulders if someone commits suicide over it. There's not much that can be done other than ban them from civilized spaces.

pessimizer
3 replies
1d5h

"Bullying" is a judgement. Intrinsic to the word is a judgement that what is being done is bad, and the person doing it would not describe it that way.

And by that I don't mean that it's not bad to bully people (it's rather tautological), I mean that talk of bullying often begs the question, and is intentionally done in order to elide past the actual events that occurred. Lèse majesté laws against talking about the King, elected officials, or even cops or bureaucrats now get justified as anti-bullying.

I missed any rant about politics in the blog however. But this thread has a smell of "my politics aren't political because they are true, and your politics are political because they are lies."

jcranmer
1 replies
1d2h

What the referer-replacement page was talking is Kiwi Farms, which is doing the kind of stuff that even the US First Amendment's very expansive protections fails to protect. (The criminal liability here is "intentional inflection of emotional distress", although note that most lawsuits that allege that are groundless lawsuits that largely fail to make it pass the motion to dismiss for failure to state a claim stage as "they made me feel bad" isn't sufficient to allege an IIED).

nyssos
0 replies
1d

The criminal liability here is "intentional inflection of emotional distress",

Intentional infliction of emotional distress is a tort, not criminal.

keybored
0 replies
1d2h

I mean this is correct. Bullies, or at least the adult ones, don’t call what they do bullying (except the absolute geniuses that think “actually bullying has a social corrective behavior, I’m just helping actually”).

We’re all just busting each other’s balls, right? Look at George, he’s laughing! He’s totally in on the joke and not at all just showing submissive deference in order to not lose face.

logicprog
0 replies
1d5h

I'm glad at least some other people on this horrible site feel this way.

im3w1l
0 replies
1d1h

To me, you have to distinguish between harassment directed at a person, and on the other hand discussion about a person.

It is not possible to use hacker news to send messages to someones email. It is not even possible to send a dm to another hacker news user. You could potentially imagine hacker news being used to organize harassment of someone, but I have never seen either accusations or evidence of such a thing.

So then we have established, that since harassment directed at them is impossible, the issue they have is that people on hacker news write bad things about them.

Next, those things seem to often be flagged or downvoted, reducing exposure. But that is apparently not enough, because they can be found on google. So here we arrive at the core issue. There is content on Google about this person, that they would rather not be on Google. This is the complaint. So this person is basically saying that if there is unfavorable coverage of them findable on google, that is harassment, and it needs to go. If it isn't purged it's bullying that could lead to suicide.

This is a very ambitious "landgrab" if you will, and it starts to seriously infringe on other peoples rights.

It's similar in that manner to other things like "stop terrorism" or "think of the children". Yes clearly harassment is bad, and terrorism is bad, and pedophiles are no good. But we can't completely give up on our freedoms because of that.

lelanthran
0 replies
1d4h

Or as other people like to call it: "A real concern described in an apt manner".

Oh please. Every activist for every marginal issue says the same thing. Doesn't make it true.

aniviacat
9 replies
1d5h

I can't find a rant in the link. Was the link changed or did I overlook something?

jeroenhd
6 replies
1d5h

The author of the website details their issue with the way HN does moderation (which I can't say I disagree with, especially after HN intentionally disabled referrer headers for websites that take issue with HN). This only shows up if HN is in the referer URL.

I wouldn't call it a rant, but rather a polite request for HN policy to change.

yjftsjthsd-h
5 replies
1d2h

I wouldn't call it a rant, but rather a polite request for HN policy to change.

Which is made by blocking people with no ability to do anything about it.

mcronce
4 replies
1d1h

What? All you need to do is resubmit the request without a Referer header. For me, using Firefox, this meant clicking in the address bar, changing nothing, and hitting enter.

That's hardly "no ability to do anything about it".

yjftsjthsd-h
3 replies
1d1h

I mean people with no ability to alter HN moderation policy.

Consider it like this:

    1. I think, "oh, that looks relevant, let me open that link".
    2. I get a screenful of objections to HN moderation.
    3. I shrug and close the tab.
Since I'm just a normal user who can't change HN moderation, the outcome is that HN doesn't change but I walk away with a worse opinion of the Asahi Linux folks.

lanstin
1 replies
23h23m

Weird: HN is supposed to be technical but the people behind Asahi Linux have proven themselves to be technical wizards; meanwhile HN seems more interested in money and pseudo-libertarian politics than transformitive technical excellence. But such is our age.

eru
0 replies
17h43m

HN has actually drifted far, far to the left of where it started.

It was called Startup News at the beginning. Of course, it's gonna be interested in money and money making. Duh.

mcronce
0 replies
16h14m

It certainly didn't worsen my opinion of the Asahi Linux folks. On the contrary, I commend them for standing up for what they feel is right.

I also doubt they really care whether or not any particular HN reader has a positive or negative opinion of them.

amiga386
0 replies
1d4h

If you click the link - any link to asahilinux.org from HN - it should start "Hi! It looks like you might have come from Hacker News.", followed Hector Martin ranting that he isn't in charge of the moderation policy of HN.

The response given in Arkell v. Pressdram is appropriate.

Phelinofist
0 replies
1d

Also didn't show for me with disabled JS

KallDrexx
1 replies
1d6h

Maybe I'm reading something wrong but the discussion this HN posting is about sounds very much about trying to make a Linux subsystem and API in Rust, so that Rust's type system can enforce conformance to safety mechanisms via its abstraction.

That's fundamentally different and harder than a driver being written in rust that uses the Linux's subsystems C APIs.

I can see a lot of drivers taking on the complexity tax of being written in Rust easily. The complexity tax on writing a whole subsystem in Rust seems like an exponentially harder problem.

wongarsu
0 replies
1d3h

You could just write rust code that calls the C APIs, and that would probably avoid a lot of discussions like the one in the article.

But making good wrappers would make development of downstream components even easier. As the opponents in the discussion said: there's about 50 filesystem drivers. If you make the interface better that's a boon for every file system (well, every file system that uses rust, but it doesn't take many to make the effort pay off). You pay the complexity tax once for the interface, and get complexity benefits in every component that uses the interface.

We would have the same discussions about better C APIs, if only C was expressive enough to allow good abstractions.

nicce
0 replies
1d1h

author has a very negative opinions about hacker news

I am not sure if author (Asahi Lina) has, but the project lead Hector Martin definitely has.

josephcsible
0 replies
23h30m

Warning: Don't click that link. Copy and paste the URL instead. That site serves only verbal abuse and harassment to people that it detects are HN users.

thesnide
7 replies
1d7h

While I agree there are benefits to rust, I tend to think all reason cannot fight hype.

The tax will be seen as a necessity to embrace future and progress.

I'm wondering why do not restrict ourselves to a safe subset instead of jumping into a huge bandwagon of unknown bugs and tradeoffs

pjc50
4 replies
1d6h

There is no "safe subset" of C. MISRA is fairly close, but all sorts of things that you might need, like integer arithmetic, have potential UB in C.

(The best current effort is https://sel4.systems/ , which is written in C but has a large proof of safety attached. The language design question is basically: should the proof be part of the language?)

regularfry
2 replies
1d6h

Given that undefined behaviour just means "undefined by the standard", do you get usefully closer to being able to identify a safe subset with the (MISRA/alternative, specific compiler, specific architecture) triple?

pjc50
1 replies
1d5h

No, undefined behavior does not mean "not defined by the standard", it means those places where the standard says "undefined behavior". And then the long and complicated war over "the compiler may assume that UB does not happen and then optimize on that basis".

You might be able to tighten it up in some specific cases, and those battles are being fought elsewhere, but there's stuff like lock lifetimes which you cannot do without substantial extra annotations inside or outside the language.

regularfry
0 replies
1d5h

Sorry, yes - poor wording on my part.

PhilipRoman
0 replies
1d5h

I found frama-c to be pretty good, including all the integer quirks

germandiago
0 replies
1d4h

I like Linus view on evolution. Evolution will tell what the most sensible choices are over time. It is like "the market" in some way. Let everyone make their bets, wait, see, analyze, research. That's it.

Yoric
0 replies
1d6h

Safe subset of what?

vollbrecht
3 replies
1d7h

One can argue that any additional code is introducing complexity, not only writing Rust. Does that mean we should just stop innovating and go into an indefinite state of maintenance, since we are already so vast?

A tax in one place may not be a net negative, if it's used like in the real world to offset other problems. And just saying it will not offset any problems because of a single discussion, that does not have a definite conclusion, comes of as a short argument.

another2another
1 replies
1d6h

Does that mean we should just stop innovating and go into an indefinite state of maintenance

If you mean that not using Rust (or maybe some other languages e.g. Zig or Ada?) means that there can be no innovation in the Linux kernel, I would have to disagree since there's been plenty of progress in plain old c (see for instance io_uring), not to mention the fact that the c language itself could change to make developer ergonomics better - since that seems to be the nub of the problem.

It also raises the question of what happens in the future when Rust is no longer the language du jour - how do we know it's going to last the course? And now there's 2 different codebases, potentially maintained by 2 different diminishing sets of active maintainers.

vollbrecht
0 replies
1d5h

If you mean that not using Rust (or maybe some other languages e.g. Zig or Ada?) means that there can be no innovation in the Linux kernel, I would have to disagree since there's been plenty of progress in plain old c.

No i didn't mean that. If i understand OP correctly here, he argued that it is a tax to use rust, a tax is always bad, and thous should be avoided.

We obviously can't now the future. We also can't now how future maintainers look like, and if there will be a bigger abundance of people understanding kernel level C or kernel level Rust or both.

I also don't think that any one developer can claim to fully get every part of the Linux Kernel. So if one person want's to work on a particular subsection they need to make themself familiar with it, independent of the language used. And then we are back at the argument, is the additional tax bad, or what does it bring to the table.

riku_iki
0 replies
23h15m

One can argue that any additional code is introducing complexity

additional code can replace more complex existing or future code, thus can reduce complexity

Already__Taken
0 replies
1d7h

reads a lot like letting perfect be the enemy of good.

gwbas1c
31 replies
1d4h

Maybe they are asking the wrong questions?

Does Rust need to change to make it easier to call C?

I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C. (I'm sure someone reading this has done it.) In contrast, in C++ and Objective C, all you need to do is include the right header and call the function. Swift lets you include Objective C files, and you can call C from them.

Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

lambda
20 replies
1d3h

Calling C from Rust can be quite simple. You just declare the external function and call it. For example, straight out of the Rust book https://doc.rust-lang.org/book/ch19-01-unsafe-rust.html#usin... :

  extern "C" {
      fn abs(input: i32) -> i32;
  }

  fn main() {
      unsafe {
          println!("Absolute value of -3 according to C: {}", abs(-3));
      }
  }
Now, if you have a complex library and don't want to write all of the declarations by hand, you can use a tool like bindgen to automatically generate those extern declarations from a C header file: https://github.com/rust-lang/rust-bindgen

There's an argument to be made that something like bindgen could be included in Rust, not requiring a third party dependency and setting up build.rs to invoke it, but that's not really the issue at hand in this article.

The issue is not the low-level bindings, but higher level wrappers that are more idiomatic in Rust. There's no way you're going to be able to have a general tool that can automatically do that from arbitrary C code.

varjag
16 replies
1d3h

That's not really "simple", it's on par with C FFI in about any other language (except C++), with same drawbacks.

commodoreboxer
6 replies
1d2h

It's on par with C++, too. In C++ you need an `extern "C"`, because C++ linkage isn't guaranteed to be the same as C linkage. You can get away with wrapping that around it in a preprocessor conditional, but that's not all that much easier than Rust's bindgen.

A lot of C to C++ interop is actually done wrong without knowing it. Throwing a C++ static function as a callback into a C function usually works, but it's not technically correct because the linkage isn't guaranteed to be the same without an extern "C". In practice, it usually is the same, but this is implementation-defined, and C++ could use a different calling convention from C (e.g. cdecl vs fastcall vs stdcall. The Borland C++ compiler uses fastcall by default for C++ functions, which will make them illegal callbacks for C functions).

The major difference between Objective-C and C++'s C interop and other languages is the lack of the preprocessor. Macros will just work because they use the same preprocessor. That's really not easy to paper over in other languages that can't speak the C preprocessor.

spacechild1
5 replies
19h53m

I think you're confusing some terms here.

In C++ you need an `extern "C"`, because C++ linkage isn't guaranteed to be the same as C linkage.

`extern "C"` has nothing to do with linkage, all it does is disable namemangling, so you get the same symbol name as with a C compiler.

Throwing a C++ static function as a callback into a C function usually works, but it's not technically correct because the linkage isn't guaranteed to be the same without an extern "C".

Again, linkage is not relevant here. Your C++ callbacks don't have to be declared as extern "C" either, because the symbol name doesn't matter. As you noted correctly, the calling conventions must match, but in practice this only matters on x86 Windows. (One notable example is passing callbacks to Win32 API functions, which use `stdcall` by default.) Fortunately, x86_64 and ARM did away with this madness and only have a single calling convention (per platform).

LegibleCrimson2
4 replies
16h30m

`extern "C"` has nothing to do with linkage, all it does is disable namemangling, so you get the same symbol name as with a C compiler.

extern "C" also ensures that the C calling convention is used, which is relevant for callbacks. It's not just name mangling. This is the reason that extern "C" static functions exist. You can actually overload a C++ function by extern "C" vs extern "C++", and it will dispatch it appropriately based on whether the passed in function is declared with C or C++ linkage.

And I'm not sure the terms are confused, because that's how most documentation refers to it: https://learn.microsoft.com/en-us/cpp/cpp/extern-cpp?view=ms...

In C++, when used with a string, extern specifies that the linkage conventions of another language are being used for the declarator(s). C functions and data can be accessed only if they're previously declared as having C linkage. However, they must be defined in a separately compiled translation unit.

And https://en.cppreference.com/w/cpp/language/language_linkage

The post you're replying to had it completely right. extern "C" is entirely about linkage, which includes calling convention and name mangling.

As you noted correctly, the calling conventions must match, but in practice this only matters on x86 Windows.

Or if you want your program to actually be correct, instead of just incidentally working for most common cases, including on future systems.

If you're passing a callback to a C function from C++, it's wrong unless the callback is declared extern "C".

spacechild1
3 replies
6h52m

extern "C" also ensures that the C calling convention is used, which is relevant for callbacks. It's not just name mangling.

I stand corrected. I didn't know that `extern "C"` enforces the C calling convention.

However, on modern platforms this doesn't really matter because, as I said, there is only a single calling convention (per platform). And I'm pretty sure that future platforms will keep it that way. Fortunately, if you try to pass a C++ callback of the wrong calling convention, you get a compiler error.

If you're passing a callback to a C function from C++, it's wrong unless the callback is declared extern "C".

That's certainly not true because `extern "C"` is not the only way to specify the calling convention. In fact, you might need a different calling convention! As I mentioned, on x86 the Windows API uses stdcall for all API functions and callbacks, so `extern "C"` would be wrong. If you look at the Microsoft examples, you will see that they declare the callbacks as WINAPI (without `extern "C"`): https://learn.microsoft.com/en-us/windows/win32/procthread/c...

So I stand by my point that in practice you don't need `extern "C"` for passing C++ callbacks to C functions. You can pass a lambda function just fine, and when it doesn't work the compiler will tell you.

LegibleCrimson2
2 replies
4h27m

A couple big caveats here:

* cdecl is a platform specific calling convention. There is no standard C ABI. cdecl is a wintel thing, not the standard C calling convention. On Linux, this is the System V ABI for instance. On Windows ARM, it's also not cdecl.

* Specifying calling convention at all is a compiler specific extension. There is no standard way of specifying a C calling convention without `extern`.

So specifying cdecl gets you the right calling convention on some platforms and ties your code to some specific compilers. The only portable way to specify C linkage in a C++ program is extern "C". You will always get the right ABI for your platform and it will work on every compiler.

So I stand by my point that in practice you don't need `extern "C"` for passing C++ callbacks to C functions. You can pass a lambda function just fine, and when it doesn't work the compiler will tell you.

The compiler will very often not tell you. It will complain if the lambda can't be coerced to a function pointer (because it's a closure) or if the argument or return types are wrong. An incorrect ABI will usually be accepted and will just do the wrong thing or crash at runtime. The C++ standard says that language linkage is part of a function's type, but very few compilers actually support this.

Your position works sometimes for some compilers and some platforms. I assert that it's better to use standard C++ features and just work everywhere.

spacechild1
1 replies
3h1m

* Specifying calling convention at all is a compiler specific extension.

Yes, because the calling conventions themselves are platform/compiler specific.

There is no standard way of specifying a C calling convention without `extern`.

Well, on modern platforms you don't need to because there is only a single calling convention that is shared between C and C++. For legacy platforms with multiple calling conventions, you need compiler specific extensions by definition.

The only portable way to specify C linkage in a C++ program is extern "C". You will always get the right ABI for your platform and it will work on every compiler.

Again, on platforms with several calling conventions `extern "C"` absolutely won't give you the appropriate calling convention all the time. See again my Win32 API example.

The compiler will very often not tell you > An incorrect ABI will usually be accepted and will just do the wrong thing or crash at runtime.

That's absolutely not my experience! Functions with different calling conventions have different types, so a C++ compiler must reject such code. See https://godbolt.org/z/6EnncE5v5. (Note that for the lambda case MSVC is smart enough to automatically add __stdcall whereas MinGW refuses to compile. The free function is rejected by both compilers.)

Can you show me an actual example where a C++ compiler silently accepts a function with the wrong calling convention?

Your position works sometimes for some compilers and some platforms.

It has always worked for me so far and I write software for many different platforms.

LegibleCrimson2
0 replies
2h31m

Ah, yeah, you're right. I was spacing the fact that C as well as C++ can have multiple calling convention. I blame early morning brain.

As far as the wrong calling convention goes, I'm basing it on the fact that an extern "C++" function can be passed as a callback where an extern "C" is demanded. Even if they're the same calling convention, that should fail, but it doesn't. Looks like it doesn't fail at runtime, which is a small comfort, but given the different permissiveness of different compilers, it still makes me very nervous to pass a C++ function as a C callback and just hope that it works, given that it isn't guaranteed in the standard.

gizmo686
4 replies
1d2h

... And? Most languages make C interop simple.

varjag
3 replies
1d2h

They quickly become unwieldy on non-trivial APIs, with hundreds of definitions across dozens of files and with macros to boot. Naturally people would still get the job done but it's beyond simple.

mcronce
2 replies
1d2h

That's what bindgen is for, as was mentioned in the original comment you replied to.

varjag
1 replies
23h14m

How well does it handle preprocessor macros in APIs?

marshray
0 replies
19h58m

I have used it successfully against header files for Win32 COM interfaces generated from IDL which include major parts of the infamous "windows.h". Almost every type is a macro.

This is an extremely well-understood space.

Just open the docs and do it.

kelnos
3 replies
22h47m

How is that not simple? You just declare the function and then call it. I find it hard to imagine how it could be any more simple than that.

varjag
2 replies
21h21m

Now imagine a hundred or two functions, structures and callbacks, some of them exposed only as CPP macros over internal implementation. PJSIP low level API is one example.

lambda
0 replies
15h0m

But... that's what bindgen is for. Which I mentioned.

I said it "can be quite simple"; for simple use cases, just using extern and translating the declarations by hand is perfectly viable.

For more complex cases, you use bindgen.

googh
0 replies
11h25m

Can someone shed some light on why the parent comment (by varjag) is downvoted?

jacobgorm
1 replies
22h36m

Passing integers around is easy, sharing structs or strings and context pointers for use in callbacks crossing the language barrier etc is typically much harder.

Someone
0 replies
19h52m

For rust code calling C, sharing structs is doable with #[repr(C)]. See https://doc.rust-lang.org/reference/type-layout.html#reprc-s...

(Nitpick: I don’t think it technically is correct to call this “The C representation”, as strict layout in C depends on the C compiler/ABI. I wouldn’t trust this to be good enough for serializing data between 32-bit and 64-bit systems, for example. For calling code on the same system, it’s good enough, though)

Smaug123
3 replies
1d

The point is that Rust can model invariants that C can't. You can call both ways, but if C is incapable of expressing what Rust can, that has important implications for the design of APIs which must be common to both.

gwbas1c
2 replies
21h1m

That's not how I interpreted it: There is a clear need to be able to write filesystems in Rust, and the kernel developer(s) who write the filesystem API don't want to have to maintain the bindings to Rust.

Smaug123
1 replies
11h0m

They say this in almost every paragraph! For example, five of the first seven paragraphs:

The first is to express more of the requirements using Rust's type system in order to catch more mistakes at compile time.

Almeida showed an example of how the Rust type system can eliminate certain kinds of errors.

… it was exactly that kind of discussion/argument that could be avoided by encapsulating the rules into the Rust types and abstractions; the compiler will know the right thing to do.

… All of that is enforced through the type system.

the whole idea is to determine what the constraints are from Viro and other filesystem developers, then to create types and abstractions that can enforce them.

More explicitly:

The object lifecycles are being encoded into the Rust API, but there is no equivalent of that in C; if someone changes the lifecycle of the object on one side, the other will have bugs.

As those changes occur, "we will find out whether or not this concept of encoding huge amounts of semantics into the type system is a good thing or a bad thing".
gwbas1c
0 replies
7h31m

In addition, when the C code changes, the Rust code needs to follow along, but who is going to do that work? Almeida agreed that it was something that needs to be discussed.

FWIW: I shipped a Windows file system driver in 2020. The api hadn't changed in years. Does Linux's API for kernel-space filesystems really change so rapidly that keeping the rust bindings up-to-date would be a considerable amount of work, in the long run?

tupshin
1 replies
1d4h

This is not a notable challenge in rust, nor relevant to the article.

The article is about finding ways of using rust to actually implement kernel fs drivers/etc. Note that any rust code in the kernel is necessarily consuming C interfaces.

Bindgen works quite well for the use case that you are thinking.

https://github.com/rust-lang/rust-bindgen

moomin
0 replies
1d3h

Yeah, the Rust proponents are being significantly more ambitious. Not just the ability to code a file system in Rust, but do it in a way that catches a lot of the correctness issues relating to the complex (and changing) semantics of FS development.

kelnos
0 replies
22h48m

Does Rust need to change to make it easier to call C?

No, because it's already dirt-simple to do. You just declare the C function as 'extern "C"', and then call it. (You will often need to use 'unsafe' and convert or cast references to raw pointers, but that's simple syntax as well.)

There are tools (bindgen being the most used) that can scan C header files and produce the declarations for you, so you don't have to manually copy/paste and type them yourself.

Maybe Rust as a language needs to bend a little in this case, instead of expecting the kernel developers to bend to the language?

I think you maybe misunderstood the article? There's nothing wrong with the language here. The argument is around how Rust should be used. The Rust-for-Linux developers want to encode semantics into their API calls, using Rust's features and type system, to make these calls safer and less error-prone to use. The people on the C side are afraid that doing so will make it harder for them to evolve the behavior and semantics of their C APIs, because then the Rust APIs will need to be updated as well, and they don't want to sign up for that work.

An alternative that might be more palatable is to not make use of Rust features and the type system in order to encode semantics into the Rust API. That way, it will be easier for C developers, as updating Rust API when C API changes will be mechanical and simple to do. But then we might wonder what the point is of all this Rust work if the Rust-for-Linux developers can't use Rust some features to make better, safer APIs.

I've done a bit of Rust, and (as a hobbyist,) it's still not clear (to me) how to interoperate with C.

Kinda weird that you currently have the top-voted comment when you admit you don't understand the language well enough to have an informed opinion on the topic at hand.

duped
0 replies
1d3h

It's actually pretty easy. All you need is declare `extern "C" fn foo() -> T` to be able to call it from Rust, and to pass the link flags either by adding a #[link] attribute or by adding it in a build.rs.

You can use the bindgen crate to generate bindings ahead of time, or in a build.rs and include!() the generated bindings.

Normally what people do is create a `-sys` crate that contains only bindings, usually generated. Then their code can `use` the bindings from the sys crate as normal.

in contrast, in C++ and Objective C, all you need to do is include the right header

and link against the library.

codetrotter
0 replies
1d4h

I’ve written Rust code that called C++

It wasn’t completely straightforward, but on the whole I figured out everything I needed to within a few days in order to be able to do it.

Calling C would surely be very similar.

pornel
5 replies
1d5h

I don't get how can each file system have a custom lifecycle for inodes, but still use the same functions for inode lifecycle management, but apparently with different semantics? That sounds like the opposite of an abstraction layer, if the same function must be used in different ways depending on implementation details.

If the lifecycle of inodes is filesystem-specific, it should be managed via filesystem-specific functions.

seanhunter
0 replies
1d

If you haven't seen it before, you might find this useful https://www.kernel.org/doc/html/latest/filesystems/vfs.html

It's an overview of the VFS layer, which is how they do all the filesystem-specific stuff while maintaining a consistent interface from the kernel.

sandywaffles
0 replies
1d3h

I understood it as they're working to abstract as much as is generally and widely possible in the VFS layer, but there will still be (many?) edge cases that don't fit and will need to be handled in FS-specific layers. Perhaps the inode lifecycle was just an initial starting point for discussion?

phkahler
0 replies
1d4h

> I don't get how can each file system have a custom lifecycle for inodes, but still use the same functions for inode lifecycle management, but apparently with different semantics?

I had the same question. They're trying to understand (or even document) all the C APIs in order to do the rust work. It sounds like collecting all that information might lead to some [WTFs and] refactoring so questions like this don't come up in the first place, and that would be a good thing.

crest
0 replies
1d4h

I assume it's supposed to work by having the compiler track the lifetime of the inodes. The compiler is expected to help with ephemeral references (the file system still has to store the link count to disk).

DSMan195276
0 replies
1d1h

but still use the same functions for inode lifecycle management

I'm not an expert by any means but I'm somewhat knowledgeable, there's different functions that can be used to create inodes and then insert them into the cache. `iget_locked()` that's focused on here is a particular pattern of doing it, but not every FS uses that for one reason or another (or doesn't use it in every situation). Ex: FAT doesn't use it because the inode numbers get made-up and the FS maintains its own mapping of FAT position to inodes. There's then also file systems like `proc` which never cache their inode objects (I'm pretty sure that's the case, I don't claim to understand proc :P )

The inode objects themselves still have the same state flow regardless of where they come from, AFAIK, so from a consumer perspective the usage of the `inode` doesn't change. It's only the creation and internal handling of the inode objects by the FS layer that depends based on what the FS needs.

BiteCode_dev
5 replies
1d4h

Given how those discussions usually go, and the scale of the change, I find that discussion extraordinarily civil.

I disagree with the negative tone of this thread, I'm quite optimistic given how clearly the parties involved were able to communicate the pain points with zero BS.

nickparker
4 replies
1d3h

I found myself reading this more for the excellent notetaking than for the content.

I suspect the discussion was about as charged, meandering, and nitpicky as we all expect a PL debate among deeply opinionated geeks to be, and Jake Edge (who wrote this summary) is exceptionally good at removing all that and writing down substance.

BiteCode_dev
3 replies
1d3h

Certainly.

We are talking about extremely competent people who worked on a critical piece of software for years and invested a lot of their lives in it, with all pain, effort, experience, and responsibilities that come with that.

That this debate is inscribed is a process that is still ongoing, and in fact, progressing, is a testament to how healthy the situation is.

I was expecting the whole Rust thing to be shut down 10 times, in a flow of distasteful remarks, already.

This means that not only Rust is vindicated as promising for the job, but both teams are willing and up to the task of working on the integration.

Those projects are exhausting, highly under-pressure situations, and they last a long time.

I still find that the report is showing a positive outcome. What do people expect? Move fast and break things?

A barrage of "no" is how it's supposed to go.

structural
0 replies
1d

I agree. And ideally, every time you raise the question and get the "no" response, you learn something about the system you're modifying or the reviewer learns something about your solution. Then you improve your solution, and come back.

Eventually consensus is built - either the solution becomes good enough, or both the developers and the reviewers agree that it's not going to work out and the line of development gets abandoned.

Large-scale change in production is hard, and messy, and involves a lot of imperfect humans that we hope are mostly well-intentioned.

emporas
0 replies
21h16m

Moving using a 70's technology breaks things. Rust is tested already on other OSes like Windows, Mac (or iOS) and Android and solves several pitfalls of C and C++. Some quotes from the Android team [1]:

"To date, there have been zero memory safety vulnerabilities discovered in Android’s Rust code."

"Safety measures make memory-unsafe languages slow"

Not saying Rust is the perfect solution to every problem, but it is definitely not an outlandish proposition to use it where it makes sense.

[1] https://security.googleblog.com/2022/12/memory-safe-language...

0cf8612b2e1e
0 replies
1d2h

I am definitely of the opinion we need to rush away from C. Rust, Go, Zig, etc does not matter, but anything which can catch some of the repeated mistakes that squishy humans keep repeating.

That being said, the file system is one of those infrastructure bits where you cannot make a mistake. Introduce a memory corruption bug leading to crashes every Thursday? Whatever. Total loss of data for 0.1% of users during a leap year at a total eclipse? Apocalypse.

There is no amount of being too careful when interfacing with storage. C may have a lot of foibles, but it is the devil we know.

sandywaffles
2 replies
1d3h

I wasn't clear and am not familiar enough with the Linux FS systems to know if this Rust API would be wrapping or re-implementing the C APIs? If it's re-implementing (or rather an additional API) it seems keeping the names the same as the C API would be problematic and lead to more confusion over time, even if initially it helped already-familiar-developers grok whats going on faster.

CGamesPlay
1 replies
1d3h

Almeida put up a slide with the equivalent of iget_locked() in Rust, which was called get_or_create_inode().

Seems like the answer is that it's reimplementing and doesn't use the same names.

swfsql
0 replies
1d2h

I'm not familiar with those functions, but I had the impression they actually shouldn't have the same name.

Since the Rust function has implicit/automatic behavior depending on how it's state is and how it's used by the callsite, and since the C one doesn't have any implicit/automatic behavior (as in, separate/explicit lifecycle calls must be made "manually"), I don't even see the reason for them to have the same name.

That is to say, having the same name would be somehow wrong since the functions do and serve for different stuff.

But it would make sense, at least from the Rust site, to have documentation referring to the original C name.

brodouevencode
1 replies
1d3h

about the disconnect between the names in the C API and the Rust API, which means that developers cannot look at the C code and know what the equivalent Rust call would be

Ah, the struggle of legacy naming conventions. I've had success in keeping the same name but when I wanted an alternative name I would just wrap the old name with the new name.

But yeah, naming things is hard.

adastra22
0 replies
1d

One of the two major problems in computer science (the other two being concurrency and off-by-one errors).

simon04
0 replies
1d2h

tl;dr?

pjmlp
0 replies
1d5h

The disconnect section of the article is a good example of exactly on how not to do the things, and how things can turn out sour if the existing community isn't taken for the ride.

hu3
0 replies
1d5h

Some of the comments below the lwn.net page are rather disrespectful.

Imagine getting this comment about the open source project you contribute to:

"Science advances one funeral at a time"