HN comments for: Why choose async/await over threads?

avodonosov

58 replies

10h2m

2024-03-25 08:24:19 UTC

Issues with the article:

1. Only one example is given (web server), solved incorrectly for threads. I will elaborate below.

2. The question is framed as if people specifically want OS threads instead of async/await .

But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words. (The `wait` word remains in it's classic sense: when you wait for an event, for another thread to complete, etc - a blocking operation of waiting. But we don't want it for function invocations).

Back to #1 - the web server example.

When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error? Does socket remain open, remains connected to the client - essentially leaked?

The timeout solution for threaded version may look almost the same, as it looks for async/await: `threaded_race(client_thread, timeout).wait`. This threaded_race function uses a timer to track a timeout in parallel with the thread, and when the timeout is reached it calls `client_thread.interrupt()` - the Java way. (The `Thread.interrupt()`, if thread is not blocked, simply sets a flag; and if the thread is blocked in an IO call, this call throws an InterruptedException. That's a checked exception, so compiler forces programmer to wrap the `client.read_to_end(&mut data)` into try / catch or declare the exception in the `handle_client`. So programmer will not forget to close the client socket).

littlestymaar

30 replies

9h54m

2024-03-25 08:32:32 UTC

But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and in stead of "await" just use normal function calls? Then you will suddenly be programming in threads.

Some programmers do, but many want exactly the opposite as well. Most of the time I don't care if it's an OS blocking syscall or a non-blocking one, but I do care about understanding the control flow of the program I'm reading and see where there's waiting time and how to make them run concurrently.

In fact, I'd kill to have a blocking/block keyword pair whenever I'm working with blocking functions, because they can surreptitiously slow down everything without you paying attention (I can't count how many pieces of software I've seen with blocking syscalls in the UI thread, leading to frustratingly slow apps!).

avodonosov

17 replies

9h32m

2024-03-25 08:54:14 UTC

But all functions are blocking.

   fn foo() {bar(1, 2);}
   fn bar(a, b) {return a + b;}

Here bar is a blocking function.

littlestymaar

15 replies

9h10m

2024-03-25 09:16:15 UTC

No they aren't, and that's exactly my point.

Most functions aren't doing any syscall at all, and as such they aren't either blocking or non-blocking.

Now because of path dependency and because we've been using blocking functions like regular functions, we're accustomed to think that blocking is “normal”, but that's actually a source of bugs as I mentioned before. In reality, async functions are more “normal” than regular functions: they don't do anything fancy, they just return a value when you call them, and what they return is a future/promise. In fact you don't even need to use any async anotation for a function to be async in Rust, this is an async function:

    fn toto() -> impl Future<Output = String>> {
        unimplemented!();
    }

The async keyword exists simply so that the compiler knows it has to desugar the await inside the function into a state machine. But since Rust has async blocks it doesn't even need async on functions at all, the information you need comes from the type of the return value, that is a future.

Blocking functions, on the contrary, are utterly bizarre. In fact, you cannot make one yourself, you must either call another blocking function[1] or do a system call on your own using inline assembly. Blocking functions are the anomaly, but many people miss that because they've lived with them long enough to accept them as normal.

[1] because blockingness is contagious, unlike asynchronousness which must be propagated manually, yes ironically people criticizing async/await get this one backward too

immibis

9 replies

8h27m

2024-03-25 09:59:17 UTC

bar blocks waiting for the CPU to add the numbers.

littlestymaar

8 replies

8h8m

2024-03-25 10:18:11 UTC

Nope it doesn't, in the final binary the bar function doesn't even exist anymore, as the optimizer inlined it, and CPUs have been using pipelining and speculative execution for decades now, they don't block on single instruction. That's the problem with abstractions designed in the 70s, they don't map well with the actual hardware we have 50 years after…

diarrhea

4 replies

7h8m

2024-03-25 11:17:44 UTC

Make `a + b` `A * B` then, multiplication of two potentially huge matrices. Same argument still holds, but now it's blocking (still just performing addition, only an enormous number of times).

littlestymaar

3 replies

6h54m

2024-03-25 11:32:04 UTC

It's not blocking, it's doing actual work.

Blocking is the way used by the old programming paradigm to deal with asynchronous actions, and it works by behaving the same way as when the computer actually computes thing, so that's where the confusion comes from. but the two situations are conceptually very different: in one case, we are idle (but don't see it), in another case we're busy doing actual work. Maybe in case 2. we could optimize the algorithm so that we spend more time, but that's not sure, whereas in case 1. there's something obvious to do to speed things up: do something at the same time instead of waiting mindlessly. Having a function marked async gives you a pointer that you can actually run it concurrently to something else and expect speed up, whereas with blocking syscall there's no indication in the code that those two functions you're calling next to each other with not data dependency between them would gain a lot to be run concurrently by spawning two threads.

BTW, if you want something that's more akin to blocking, but at a lower level, it's when the CPU has to load data from RAM: it's really blocked doing nothing useful. Unfortunately that's not something you can make explicit in high-level languages (or at least, the design space hasn't been explored) so when these kinds of behavior matters to you, that's when you dive to assembly.

Izkata

2 replies

3h59m

2024-03-25 14:27:30 UTC

A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc". All other functions are blocking by default, including that simple addition "bar" function above.

littlestymaar

1 replies

3h38m

2024-03-25 14:48:11 UTC

Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning (unlike Rust future which are lazy, that is: do no work before they are awaited on).

As I said before, most of what you call a “blocking function” is actually a “no opinion function” but since in the idiosyncrasy of most programming languages blocking functions are being called like “no opinion” ones, you are mixing them up. But it's not a fundamental rule. You could imagine a language where blocking functions (which contains an underlying blocking syscall) are being called with the block keyword and where regular functions are just called like functions. There's no relation between regular functions and blocking functions except path dependency that led to this particular idiosyncrasy we live in, it is entirely contingent.

Izkata

0 replies

3h4m

2024-03-25 15:22:26 UTC

Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning

Yes, that's syntactic sugar for returning a promise. This pattern is something we've long called a non-blocking function in Javascript. The first part that's not in the promise is for setting it up.

imtringued

1 replies

6h49m

2024-03-25 11:36:57 UTC

I don't know what to tell you, but that is how sequential code works. Sure you can find some instruction level parallelism in the code and your optimizer may be able to do it across function boundaries, but that is mostly a happy accident. Meanwhile HDLs are the exact opposite. Parallel by default and you have to build sequential execution yourself. What is needed for both HLS and parallel programming is a parallel by default hybrid language that makes it easy to write both sequential and parallel code.

littlestymaar

0 replies

5h1m

2024-03-25 13:25:13 UTC

Except, unless you're using atomics or volatiles, you have no guaranties that the code you're writing sequentially is going to be executed this way…

gpderetta

0 replies

3h57m

2024-03-25 14:28:39 UTC

Sure, unless it is the first time you are executing that line of code and you have to wait for the OS to slowly fault it in across a networked filesystem.

anonymoushn

3 replies

8h16m

2024-03-25 10:09:43 UTC

"makes certain syscalls" is a highly unconventional definition of "blocking" that excludes functions that spin wait until they can pop a message from a queue.

If your upcoming systems language uses a capabilities system to prevent the user from inadvertently doing things that may block for a long time like calling open(2) or accessing any memory that is not statically proven to not cause a page fault, I look forward to using it. I hope that these capabilities are designed so that the resulting code is more composable than Rust code. For example it would be nice to be able to use the Reader trait with implementations that source their bytes in various different ways, just as you cannot in Rust.

littlestymaar

2 replies

6h48m

2024-03-25 11:37:50 UTC

Blocking syscalls are a well defined and well scoped class of problems, sure there are other situations where the flow stops and a keyword can't save you from everything.

Your reasoning is exactly similar to the folks who say “Rust doesn't solve all bugs” because it “just” solve the memory safety ones.

anonymoushn

1 replies

6h20m

2024-03-25 12:05:48 UTC

I may be more serious than you think. Having worked on applications in which blocking for multiple seconds on a "non-blocking syscall" or page fault is not okay, I think it would really be nice to be able to statically ensure that doesn't happen.

littlestymaar

0 replies

5h55m

2024-03-25 12:30:59 UTC

I'm not disputing that, in the general case I suspect this is going to be undecidable, and that you'd need careful design to carve out a subset of the problem that is statically addressable (akin to what rust did for memory safety, by restricting the expressiveness of the safe subset of the languages).

For blocking syscalls alone there's not that much PL research to do though and we could get the improvement practically for free, that's why I consider them to be different problems (also because I suspect they are much more prevalent given how much I've encountered them, but it could be a bias on my side).

cozzyd

0 replies

3h52m

2024-03-25 14:33:58 UTC

Any function can block if memory it accesses is swapped out.

EVa5I7bHFq9mnYK

0 replies

7h58m

2024-03-25 10:27:49 UTC

Difference is in quantities. bar blocks for nanoseconds, blocking that the GP talks about affects the end user, which means it's in seconds.

mike_hearn

11 replies

8h16m

2024-03-25 10:10:15 UTC

This is a really common comment to see on HN threads about async/await vs fibers/virtual threads.

What you're asking for is performance to be represented statically in the type system. "Blocking" is not a useful concept for this. As avodonosov is pointing out, nothing stops a syscall being incredibly fast and for a regular function that doesn't talk to the kernel at all being incredibly slow. The former won't matter for UI responsiveness, the latter will.

This isn't a theoretical concern. Historically a slow class of functions involved reading/writing to the file system, but in some cases now you'll find that doing so is basically free and you'll struggle to keep the storage device saturated without a lot of work on multi-threading. Fast NVMe SSDs like found in enterprise storage products or MacBooks are a good example of this.

There are no languages that reify performance in the type system, partly because it would mean that optimizing a function might break the callers, which doesn't make sense, and partly because the performance of a function can vary wildly depending on the parameters it's given.

Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

The right way to handle this is the Java approach (by pron, who is posting in this thread). You give the developer threads and make it cheap to have lots of them. Now break down tasks into these cheap threads and let the runtime/OS figure out if it's profitable to release the thread stack or not. They're the best placed to do it because it's a totally dynamic decision that can vary on a case-by-case basis.

Nullabillity

9 replies

5h37m

2024-03-25 12:49:27 UTC

It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.

You'll typically have an idea of whether or not a function performs IO from the start. Changing that after the fact violates the users' conceptual model and expectation of it, even if all existing code happens to keep working.

spinningslate

4 replies

4h49m

2024-03-25 13:36:46 UTC

You'll typically have an idea of whether or not a function performs IO from the start.

I think GP's point is: why does that matter? Much writing on Async/Await roughly correlates IO with "slow". GP rightly points out that "slow" is imprecise, changes, means different things to different people and/or use cases.

I completely get the intuition: "there's lag in the [UI|server|...], what's slowing it down?". But the reality is that trying to formalise "slow" in the type system is nigh on impossible - because "slow" for one use case is perfectly acceptable for another.

littlestymaar

2 replies

4h34m

2024-03-25 13:51:51 UTC

While slow in absolute depends on lots of factors, the relative slowness of things doesn't so much. Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network. No matter how hardware progresses, latency hierarchy is doomed to stay.

That doesn't mean it's the only factor of slowness, and that async/await solves all issues, but it's a tool that helps, a lot, to fight against very common sources of performance bugs (like how the borrow checker is useful when it protects against the nastiest class of memory vulnerabilities, even if it cannot solve all security issues).

Because the situation where “my program is stupidly waiting for some IO even though I don't even need the result right now and I could do something in the meantime” is something that happens a lot.

cesarb

1 replies

3h44m

2024-03-25 14:42:12 UTC

Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network.

The network is special: the time it takes to fetch something over the network can be arbitrarily large, or even infinite (this can also apply to disk when running over networked filesystems), while for registers/RAM/disk (as long as it's a local disk which is not failing) the time it takes is bounded. That's the reason why async/await is so popular when dealing with the network.

Nullabillity

0 replies

30m

2024-03-25 17:56:22 UTC

PCIe is a network. USB is a network. There is no such thing as a resource with a guaranteed response time.

Nullabillity

0 replies

4h5m

2024-03-25 14:20:57 UTC

Even if you ignore performance completely, IO is unreliable. IO is unpredictable. IO should be scrutinized.

mike_hearn

2 replies

5h13m

2024-03-25 13:13:31 UTC

There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO. Consider any library that introduces some sort of config file or registry keys, or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Nullabillity

1 replies

5h4m

2024-03-25 13:21:39 UTC

There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO.

There are languages that don't enforce the expectation on a type level, but that doesn't mean that people don't have expectations.

Consider any library that introduces some sort of config file or registry keys

Yeah, please don't do this behind my back. Load during init, and ask for permission first (by making me call something like Config::load() if I want to respect it).

or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.

Slightly more reasonable, but this still introduces a lot of considerations that the application developer needs to be aware of (how should the library find its helper binary? what if the sandboxing mechanism fails or isn't available?).

mike_hearn

0 replies

32m

2024-03-25 17:54:11 UTC

For the sandbox example I was thinking of desktop operating systems where things like file IO can become brokered without apps being aware of it. So the API doesn't change, but the implementation introduces IPC where previously there wasn't any. In practice it works fine.

jerf

0 replies

4h58m

2024-03-25 13:27:44 UTC

If you want to go full Haskell on the problem for purity-related reasons, by all means be my guest. I strongly approve.

However, unless you're in such a language, warping my entire architecture around that objection does not provide a good cost-benefit tradeoff. I've got a lot of fish to fry and a lot of them are bigger than this in practice. Heck, there's still plenty of programmers who will consider it as unambiguous feature that they can add IO to anything they want or need to and consider it a huge negative when they can't, and a lot of programmers who don't practice an IO isolation and don't even conceive of "this function is guaranteed to not do any IO/be impure" as a property a function can have.

littlestymaar

0 replies

7h46m

2024-03-25 10:39:36 UTC

You can't encode everything about performance in the type system, but that doesn't mean you cannot do it at all: having a type system that allows you to control memory layout and allocation is what makes C++ and Rust faster than most languages. And regarding what you say about storage access: storage bandwidth is now high, but latency when accessing an SSD is still much higher than accessing RAM, and network is even worse. And it will always be the case no matter what progress hardware makes, because of the speed of light.

Saying that async/await doesn't help with all performance issues is like saying Rust doesn't prevent all bugs: the statement is technically correct, but that doesn't make it interesting.

Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult.

Many developers have embraced the async/await model with delight, because it instead makes maintenance easier by making the intent of the code more explicit.

It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written. Async/await may be slightly more tedious to write (it's highly context dependent though, when you have concurrent tasks to execute or need cancellation, it becomes much easier with futures).

The right way to handle this is the Java approach (by pron, who is posting in this thread)

No it's not, and Mr Pressler's has repeatedly shown that he misses the social and communication aspects, so it's not entirely surprising.

valenterry

7 replies

4h17m

2024-03-25 14:08:44 UTC

In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

It has been tried various times in the last decades. You want to search for "RPC". All attempts at trying to unify sync and async have failed, because there is a big semantical difference between running code within a thread or between threads or even between computers. Trying to abstract over that will eventually be insufficient. So better learn how to do it properly from the beginning.

toast0

5 replies

1h44m

2024-03-25 16:41:37 UTC

I think you've got some of this in your own reply, but ... I feel like Erlang has gone all in on if async is good, why not make everything async. "Everything" in Erlang is built on top of async messaging passing, or the appearance thereof. Erlang hasn't taken over the world, but I think it's still successful; chat services descended from ejabberd have taken over the world; RabbitMQ seems pretty popular, too. OTOH, the system as a whole only works because Erlang can be effectively preemptive in green threads because of the nature of the language. Another thing to note is that you can build the feeling of synchronous calling by sending a request and immediately waiting for a response, but it's vary hard to go the other way. If you build your RPC system on the basis of synchronous calls, it's going to be painful --- sometimes you want to start many calls and then wait for the responses together, that gets real messy if you have to spawn threads/tasks every time.

valenterry

4 replies

1h37m

2024-03-25 16:49:06 UTC

I'm not very familiar with Erlang, but from my understanding, Erlang actually does have this very distinction - you either run local code or you interact with other actors. And here the big distinction gets quite clear: once you shoot a message out, you don't know what will happen afterwards. Both you or the other actor might crash and/or send other messages etc.

So Erlang does not try to hide it, instead, it asks the developer to embrace it and it's one of its strength.

That being said, I think that actors are a great way to model a system from the birds-perspective, but it's not so great to handle concurrency within a single actor. I wish Erlang would improve here.

toast0

3 replies

1h12m

2024-03-25 17:14:05 UTC

Actors are a building block of concurrency. IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency. But that's very out of scope of Erlang, BEAM code does compile (JIT) to native code on amd64 and arm64, but the JIT is optimized for speed, since it happens at code load time, it's not an profiling/optimizing JIT like Java's hotspot. There's no register scheduler like you'd need to achieve concurrency, all the beam ops end up using the same registers (more or less), although your processor may be able to do magic with register renaming and out of order operations in general.

If you want instruction level concurrency, you should probably be looking into writing your compute heavy code sections as Native Implemented Functions (NIFs). Let Erlang wrangle your data across the wire, and then manipulate it as you need in C or Rust or assembly.

valenterry

2 replies

1h3m

2024-03-25 17:22:40 UTC

IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency

I think it makes sense to have that, including managing the communication with other actors. Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".

Actors are very powerful and a great tool to have at your disposal, but often they are too powerful for the job and then it can be better to fall back to a more "low level" or "local" type of concurrency management.

At least that's how I feel. In my opinion you need both, and while you can get the job done with just one of them (or even none), it's far from being optimal.

Also, what you mention about NIFs is good for a very specific usecase (high performance / parallelism) but concurrency has a broader scope.

toast0

1 replies

52m

2024-03-25 17:33:48 UTC

Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".

I assume you don't want to wait with a x minute timeout (and meantime not do anything). You can manage this in three ways really:

a) you could spawn an actor to send the message and wait for a response and then take the fallback action.

b) you could keep a list (or other structure, whatever) of outstanding messages and timeouts, and prune the list if you get a response, or otherwise periodically check if there's a timeout to process.

c) set a timer and do the thing when you get the timer expiration message, or cancel the timer if you get a response. (which is conceptually done by sending a message to the timer server actor, which will send you a timer handle immediately and a timer expired message later; there is a timer server you can use through the timer module, but erlang:send_after/[2,3] or erlang:start_timer/[3,4] are more efficient, because the runtime provides a lot of native timer functionality as needed for timeouts and what not anyway)

Setting up something to 'automatically' do something later means asking the question of how is the Actor's state managed concurrently, and the thing that makes Actors simple is by being able to answer that the Actor always does exactly one thing at a time, and that the Actor cannot be interrupted, although it can be killed in an orderly fashion at any time, at least in theory. Sometimes the requirement for an orderly death means it may mean an operation in progress must finish before the process can be killed.

valenterry

0 replies

43m

2024-03-25 17:42:46 UTC

Exactly. Now imagine a) is unessarily powerful. I don't want to manage my own list as in b), but other than that, b) sounds fine and c) is also fine, though, does it need an actor in the background? No.

In other words, having a well built concept for these cases is important. At least that's my take. You might say "I'll just use actors and be fine", but for me it's not sufficient.

valenterry

0 replies

4h6m

2024-03-25 14:19:46 UTC

Oh and just to add onto it, I think async/await is not really the best solution to tackle these semantic difference. I prefere the green-thread-IO approach, which feels a might more heavy but it leads to a true understanding how to combine and control logic in a concurrent/parallel setting. Async/await is great to add it to languages that already have something like promises and want to improve syntax in an easy way though, so it has its place - but I think it was not the best choice for Rust.

f_devd

7 replies

9h4m

2024-03-25 09:22:20 UTC

When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error?

Any internal race() values will be `Drop`ed and driver itself will remain (although rust will complain you are not handling the Result if you type it 'as is'), if a new socket was created local to the future it will be cleaned up.

The niceness of futures (in Rust) is that all the behavior around it can be defined, while "all functions are blocking." as you state in a sibling comment, Rust allows you to specify when to defer execution to the next task in the task queue, meaning it will poll tasks arbitrarily quickly with an explicitly held state (the Future struct). This makes it both very fast (compared to threads which need to sleep() in order to defer) and easy to reason about.

Java's Thread.interrupt is also just a sleep loop, which is fine for most applications to be fair. Rust is a system language, you can't have that in embedded systems, and it's not desirable for kernels or low-latency applications.

avodonosov

6 replies

7h40m

2024-03-25 10:45:37 UTC

Java's Thread.interrupt is also just a sleep loop

You probably mean that Java's socket reading under the hood may start a non-blocking IO operation on the socket, and then run a loop, which can react on Thread.interrupt() (which, in turn, will basically be setting a flag).

But that's an implementation detail, and it does not need to be implemented that way.

It can be implemented the same way as async/await. When a thread calls socket reading, the runtime system will take the current threads continuation off the execution, and use CPU to execute the next task in the queue. (That's how Java's new virtual threads are implemented).

Threads and async/await are basically the same thing.

So why not drop this special word `async`?

f_devd

5 replies

7h4m

2024-03-25 11:21:42 UTC

So why not drop this special word `async`?

You can drop the special word in Rust it's just sugar for 'returns a poll-able function with state'; however threads and async/await are not the same.

You can implement concurrency any way you like, you can run it in separate processes or separate nodes if you are willing to put in the work, that does not mean they equivalent for most purposes.

Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Purely from a merit perspective threads are simply a different trade-off. Just like multi-processing and distributed actor model is.

gpderetta

4 replies

5h54m

2024-03-25 12:32:06 UTC

Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).

Keyword here being almost. See Project Loom.

avodonosov

1 replies

3h13m

2024-03-25 15:12:56 UTC

@f_devd, cooperative vs preemptive is a good point.

(That threads are heavy or should be scheduled by OS is not required by the nature of the threads).

But preemptive is strictly better (safer at least) than cooperative, right? Otherwise, one accidental endless loop, and this code occupies the executor, depriving all other futures from execution.

@gpderetta, I think Project Loom will need to become preemptive, otherwise the virtual threads can not be used as a drop-in replacement for native threads - we will have deadlocks in virtual threads where they don't happen in native threads.

f_devd

0 replies

2h43m

2024-03-25 15:43:15 UTC

Preemptive is safer for liveliness since it avoids 'starvation' (one task's poll taking too long), however it in practice almost always more expensive in memory and time due to the implicit state.

In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from. And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.

In general threads are a good option if you can afford the overhead, but assuming threads as a default can significantly hinder performance (or make near impossible to even run) where Rust needs to.

Ygg2

1 replies

5h30m

2024-03-25 12:56:31 UTC

Java can afford that. M:N threads come with a heavy runtime. Java has already a heavy runtime, so what is a smidgen more flab?

Source: https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...

gpderetta

0 replies

5h11m

2024-03-25 13:14:59 UTC

So it seems that the biggest issue was having a single Io interface forcing overhead on both green and native threads and forcing runtime dispatching.

It seems to me that the best would have been to have the two libraries evolve separately and capture the common subset in a trait (possibly using dynamic impl when type erasure is tolerable), so that you can write generic code that can work with both or specialized code to take advantage of specific features.

As it stand now, sync and async are effectively separated anyway and it is currently impossible to write generic code that hande both.

imtringued

6 replies

7h21m

2024-03-25 11:04:52 UTC

In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.

Why not go one step further and invent "Parallel Rust"? And by parallel I mean it. Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe. Of course one problem with this strategy is that we don't exactly have processors that are designed to spawn and process micro-threads. You would need to go back all the way to Sun's SPARC architecture for that and then extend it with the concept of a tree based stack so that multiple threads can share the same stack.

avodonosov

1 replies

7h14m

2024-03-25 11:12:10 UTC

That would be a good step forward, I support it :)

BTW, do we need the `parallel` keyword, or better to simply let all code be parallel by default?

josephg

0 replies

6h50m

2024-03-25 11:35:49 UTC

Haskell has entered the chat…

However, almost all of the most popular programming languages are imperative. I assume most programmers prefer to think of our programs as a series of steps which execute in sequence.

Mind you, arguably excel is the most popular programming language in use today, and it has exactly this execution model.

zozbot234

0 replies

6h48m

2024-03-25 11:37:46 UTC

Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe.

The rayon crate lets you do something quite similar.

jerf

0 replies

5h3m

2024-03-25 13:22:50 UTC

I believe the answer is "that implies a runtime", and Rust as a whole is not willing to pull that up into a language requirement.

This is in contrast to Haskell, Go, dynamic scripting languages, and, frankly, nearly every other language on the market. Almost everything has a runtime nowadays, and while each individually may be fine they don't always play well together. It is important that as C rides into the sunset (optimistic and aspirational, sure, but I hope and believe also true) and C++ becomes an ever more complex choice to make for various reasons that we have a high-power very systems-oriented programming language that will make that choice, because someone needs to.

gpderetta

0 replies

5h52m

2024-03-25 12:33:39 UTC

You do not need to spawn threads/tasks eagerly. You can do it lazily on work-stealing. See cilk++.

galangalalgol

0 replies

5h46m

2024-03-25 12:39:47 UTC

Doesn't rayon have a syntax like that?

fragmede

1 replies

9h50m

2024-03-25 08:35:34 UTC

There's also writing your code with poll() and select(), which is its own thing.

tuetuopay

0 replies

18m

2024-03-25 18:08:27 UTC

well that's the great thing with async rust: you write with poll and select without writing poll and select. let the computer and the compiler get this detail out of my way (seriously I don't want to do the fd interest list myself).

and I can still write conceptually similar select code using the, well, select! macro provided by most async runtimes to do the same on a select list of futures. better separation, easier to read, and overall it boils down to the same thing.

marcosdumay

0 replies

2h30m

2024-03-25 15:56:26 UTC

if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls?

You mean like Haskell?

The answer is that you need an incredibly good compiler to make this behave adequately, and even then, every once in a while you'll get the wrong behavior and need to rewrite your code in a weird way.

Kinrany

0 replies

3h42m

2024-03-25 14:43:43 UTC

In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls?

IIRC withoutboats said in one of the posts that the true answer is compatibility with C.

adontz

45 replies

12h34m

2024-03-25 05:51:52 UTC

There are a lot of moments not covered. For example:

- async/await runs in context of one thread, so there is no need for locks or synchronization. Unless one runs async/await in multiple threads to actually utilize CPU cores, then locks and synchronization are necessary again. This complexity may be hidden in some external code. For example instead of synchronizing access to a single database connection it is much easier to open one database connection per async task. However such approach may affect performance, especially with sqlite and postgres.

- error propagation in async/await is not obvious. Especially when one tries to group up async tasks. Happy eyeballs are a classic example.

- since network I/O was mentioned, backpressure should also be mentioned. CPython implementation of async/await notoriously lacks network backpressure causing some problems.

bsder

24 replies

11h43m

2024-03-25 06:42:51 UTC

I have lots of issues with async/await, but this is my primary beef with async/await:

Remember the Gang of Four book "Design Patterns"? It was basically a cookbook on how to work around the deficiencies of (mostly) C++. Yet everybody applied those patterns inside languages that didn't have those deficiencies.

Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

However, async/await was necessary in order to shove Rust down the throats of the Javascript programmers who didn't know anything else. Quoting without.boats:

https://without.boats/blog/why-async-rust/

I drove at async/await with the diligent fervor of the assumption that Rust’s survival depended on this feature.

Whether async/await was even a good fit for Rust technically was of no consequence. Javascript programmers were used to async/await so Rust was going to have async/await so Rust could be jammed down the throats of the Javascript network services programmers--technical consequences be damned.

seabrookmx

13 replies

11h32m

2024-03-25 06:54:23 UTC

Threads have a cost. Context switching between them at the kernel level has a cost. There are some workloads that gain performance by multiplexing requests on a thread. Java virtual threads, golang goroutines, and dotnet async/await (which is multi threaded like Rust+tokio) all moved this way for _performance_ reasons not for ergonomic or political ones.

It's also worth pointing out that async/await was not originally a JavaScript thing. It's in many languages now but was first introduced in C#. So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

bsder

8 replies

10h18m

2024-03-25 08:08:06 UTC

all moved this way for _performance_ reasons

They did NOT.

Async performance is quite often (I would even go so far as to say "generally") worse than single threaded performance in both latency AND throughput under most loads that programmers ever see.

Most of the complications of async are much like C#:

1) Async allows a more ergonomic way to deal with a prima donna GUI that must be the main thread and that you must not block. This has nothing to do with "performance"--it is a limitation of the GUI toolkit/Javascript VM/etc..

2) Async adds unavoidable latency overhead and everybody hits this issue.

3) Async nominally allows throughput scaling. Most programmers never gain enough throughput to offset the lost latency performance.

anonymoushn

3 replies

8h2m

2024-03-25 10:24:08 UTC

Where does the unavoidable latency overhead come from?

Do you have some benchmarks available?

neonsunset

2 replies

7h28m

2024-03-25 10:58:28 UTC

The comment you are responding to is not wrong about higher async overhead, but it is wrong at everything else either out of lack of experience with the language or out of being confused about what it is that Task<T> and ValueTask<T> solve.

All asynchronous methods (as in, the ones that have async keyword prefixed to them) are turned into state machines, where to live across await, the method's variables that persist across it need to be lifted to a state machine struct, which is then often (but not always) needs to be boxed aka heap allocated. All this makes the cost of what would have otherwise been just a couple of method calls way more significant - single await like this can cost 50ns vs 2ns spent on calling methods.

There is also a matter of heap allocations for state machine boxes - C# is generally good when it comes to avoiding them for (value)tasks that complete synchronously and for hot async paths that complete asynchronously through pooling them, but badly written code can incur unwanted overhead by spamming async methods with await points where it could have been just forwarding a task instead. Years of bad practices arisen from low skill enterprise dev fields do not help this either, with only the switch to OSS and more recent culture shift aided by better out of box analyzers somewhat turning the tide.

This, however, does not stop C#'s task system from being extremely useful for achieving lowest ceremony concurrency across all programming languages (yes, it is less effort than whatever Go or Elixir zealots would have you believe) where you can interleave, compose and aggregate task-returning methods to trivially parallelize/fork/join parts of existing logic leading to massive code productivity improvement. Want to fire off request and do something else? Call .GetStringAsync but don't await it and go back to it later with await when you do need the result - the request will be likely done by then. Instant parallelism.

With that said, Rust's approach to futures and async is a bit different, where-as C#'s each async method is its own task, in Rust the entire call graph is a single task with many nested futures where the size of the sum of all stack frames is known statically hence you can't perform recursive calls within async there - you can only create a new (usually heap-allocated) which gives you what effectively looks a linked list of task nodes as there is no infinite recursion in calculating their sizes. This generally has lower overhead and works extremely well even in no-std no-alloc scenarios where cooperative multi-tasking is realized through a single bare metal executor, which is a massive user experience upgrade in embedded land. .NET OTOH is working on its own project to massively reduce async overhead but once the finished experiment sees integration in dotnet/runtime itself, you can expect more posts on this orange site about it.

mellinoe

1 replies

5h20m

2024-03-25 13:06:29 UTC

.NET OTOH is working on its own project to massively reduce async overhead

Where can I read more about that?

neonsunset

0 replies

5h6m

2024-03-25 13:19:47 UTC

Initial experiment issue: https://github.com/dotnet/runtime/issues/94620

Experiment results write-up: https://github.com/dotnet/runtimelab/blob/e69dda51c7d796b812...

TLDR: The green threads experiment was a failure as it found (expected and obvious) issues that the Java applications are now getting to enjoy, joining their Go colleagues, while also requiring breaking changes and offering few advantages over existing model. It, however, gave inspiration to subsequent re-examination of current async/await implementation and whether it can be improved by moving state machine generation and execution away from IL completely to runtime. It was a massive success as evidenced by preliminary overhead estimations in the results.

josephg

2 replies

6h26m

2024-03-25 12:00:03 UTC

Eh. Async and to a lesser extent green threads are the only solutions to slowloris HTTP attacks. I suppose your other option is to use a thread pool in your server - but then you need to but hide your web server behind nginx to keep it safe. (And nginx is safe because it internally uses async IO).

Async is also usually wildly faster for networked services than blocking IO + thread pools. Look at some of the winners of the techempower benchmarks. All of the top results use some form of non blocking IO. (Though a few honourable mentions use go - with presumably a green thread per request):

https://www.techempower.com/benchmarks/

I’ve also never seen Python or Ruby get anywhere near the performance of nodejs (or C#) as a web server. A lot of the difference is probably how well tuned v8 and .net are, but I’m sure the async-everywhere nature of javascript makes a huge difference.

klooney

1 replies

3h52m

2024-03-25 14:34:19 UTC

Async's perfect use case is proxies though- get a request, go through a small decision tree, dispatch the I/O to the kernel. You don't want proxies doing complex logic or computation, the stuff that creates bottlenecks in the cooperative multithreading.

seabrookmx

0 replies

2h10m

2024-03-25 16:15:45 UTC

Most API's (rest, graphql or otherwise) are effectively a proxy. Like you say, if you don't have complex logic and you're effectively mapping an HTTP request to a query, then your API code is just juggling incoming and outgoing responses and this evented/cooperative approach is very effective.

seabrookmx

0 replies

18m

2024-03-25 18:08:25 UTC

1) it offers a more ergonomic way for concurrency in general. `await Task.WhenAll(tasks);` is (in my opinion) more ergonomic than spinning up a thread pool in any language that supports both.

2) yes, there is a small performance overhead for continuations. Everything is a tradeoff. Nobody is advocating for using async/await for HFT, or in low level languages like C or Zig. We're talking nanoseconds here.. for a typical web API request that's in the 10's of ms that's a drop in the ocean.

3) I wouldn't say it's nominal! I'd argue most non-trivial web workloads would benefit from this increase in throughput. Pre-fork webservers like gunicorn can consume considerably more resources to serve the same traffic than an async stack such as uvicorn+FastAPI (to use Python as an example).

Most of the complications of async are much like C#

Not sure where you're going with this analogy but as someone who's written back-end web services in basically every language (other than lisp, no hate though), C#/dotnet core is a pretty great stack. If you haven't tried it in a while you should give it a shot.

lelanthran

3 replies

10h38m

2024-03-25 07:47:34 UTC

So by your logic Rust introduced it so it could be "jammed down the throats" of all the dotnet devs..

You're missing his point. His point is that the most popular language, which has the most number of programmers forced the hand of Rust devs.

His point is not that the first language had this feature, it's that the most programmers used this feature, and that was due to the most popular programming language having this feature.

tsimionescu

2 replies

10h24m

2024-03-25 08:01:39 UTC

That Rust needed async/await to be palatable to JS devs would only be a problem if we think async/await is not needed in Rust, because it is only useful to work around limitations of JS (single-threaded execution, in this case). If instead async/await is a good feature in its own right (even if not critical), then JS forcing Rust's hand would be at best an annoyance.

And the idea that async/await was only added to JS to work around its limitations is simply wrong. So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

lelanthran

1 replies

10h16m

2024-03-25 08:09:47 UTC

So the OP is overall wrong: async/await is not an example of someone taking something that only makes sense in one language and using it another language for familiarity.

I don't really understand the counter argument here.

My reading of the argument[1] is that "Popularity amongst developers forced Rust devs hands in adding async". If this is the argument, then a counter argument of "It never (or only) made sense in the popular language (either)" is a non-sequitor.

IOW, if it wasn't added due to technical reasons (which is the original argument, IIRC), then explaining technical reasons for/against isn't a counter argument.

[1] i.e. Maybe I am reading it wrong?

bsder

0 replies

10h12m

2024-03-25 08:13:38 UTC

You are not reading it wrong, and your statements are accurate.

My broader point is that the possibility of there being a "technically better" construct was simply not in scope for Rust. In order for Rust to capture Javascript programmers, async/await was the only construct that could possibly be considered.

And, to be fair, it worked. Rust's growth has been almost completely on the back of network services programming.

tsimionescu

2 replies

10h33m

2024-03-25 07:53:30 UTC

Async/await was invented for C#, another multithreaded language. It was not designed to work around a lack of true parallelism. It is instead designed to make it easier to interact with async IO without having to resort to manually managed thread pools. It basically codifies at the language level a very common pattern for writing concurrent code.

It is true though that async/await has a significant advantage compared to fibers that is related to single threaded code: it makes it very easy to add good concurrency support on a single thread, especially in languages which support both. In C#, it was particularly useful for executing concurrent operations from the single GUI thread of WPF or WinForms, or from parts of the app which interact with COM. This used the SingleThreadedExecutor, which schedules tasks on the current thread, so it's safe to run GUI updates or COM interactions from a Task, while also using any other async/await code, since tasks inherit their executor.

kamray23

1 replies

8h13m

2024-03-25 10:13:25 UTC

Yeah, Microsoft looked at callback hell, realized that they had seen this one before, dipped into the design docs for F# and lifted out the syntactic sugar of monads. And it worked fine. But really, async/await is literally callbacks. The keyword await just wraps the rest of the function in a lambda and stuffs it in a callback. It's fully just syntactic sugar. It's a great way of simplifying how callback hell is written, but it's still callback hell in the end. Where having everything run in callbacks makes sense, it makes sense. Where it doesn't it doesn't. At some point you will start using threads, because your use case calls for threads instead of callbacks.

WorldMaker

0 replies

3h48m

2024-03-25 14:38:11 UTC

Most compilers don't just wrap the rest of the function into a lambda but build a finite state machine with each await point being a state transition. It's a little bit more than just "syntactic sugar" for "callbacks". In most compilers it is most directly like the "generator" approach to building iterators (*function/yield is ancient async/await).

I think the iterator pattern in general is a really useful reference to keep in mind. Of course async/await doesn't replace threads just like iterators don't replace lists/arrays. There are some algorithms you can more efficiently write as iterators rather than sequences of lists/arrays. There are some algorithms you can more efficiently write as direct list/array manipulation and avoid the overhead of starting iterator finite state machines. Iterator methods are generally deeply composable and direct list/array manipulation requires a lot more coordination to compose. All of those things work together to build the whole data pipeline you need for your app. So too, async/await makes it really easy to write some algorithms in a complex concurrent environment. That async/await runs in threads and runs with threads. It doesn't eliminate all thinking about threads. async/await is generally deeply composable and direct thread manipulation needs more work to coordinate. In large systems you probably still need to think about both how you are composing your async/await "pipelines" and also how your threads are coordinated. The benefits of composition such as race/await-all/schedulers/and more are generally worth the extra complexity and overhead (mental and computation space/time), which is why the pattern has become so common so quickly. Just like you can win big with nicely composed stacks of iterator functions. (Or RegEx or Observables or any of the other many cases where designing complex state machines both complicates how the system works and eventually simplifies developer experience with added composability.)

yxhuvud

1 replies

10h47m

2024-03-25 07:39:13 UTC

I generally don't agree with the direction withoutboats went with asynchricity but you are reading in a whole lot more into that sentence than is really there. It is very clear (based on his writing, in this and other articles) that he went with the solution because he thinks it is the right one, on a technical level.

I don't agree, but making it sound like it was about marketing the language to JavaScript people is just wrong.

geodel

0 replies

3h33m

2024-03-25 14:52:40 UTC

was about marketing the language to JavaScript people is just wrong.

No it seems very right to me. Rust despite being "Systems language" was not satisfied with market size of systems programing and they really needed all those millions of JS programmers to make language a big success.

tuetuopay

0 replies

14m

2024-03-25 18:12:19 UTC

Strong disagree.

Rust can run multiple threads just fine--it's not Javascript. As such, it didn't have to use async/await. It could have tried any of a bunch of different solutions. Rust is a systems language, after all.

it allows you to have semantic concurrency where there are no threads available. like, you known, on microncontrollers without an (RT)OS where such a systems programming language is a godsend.

seriously, using async/await on embedded makes so much sense.

pkolaczk

0 replies

5h17m

2024-03-25 13:08:48 UTC

Rust can run multiple threads just fine

Rust is also used in environments which don't support threads. Embedded, bare metal, etc.

guappa

0 replies

8h26m

2024-03-25 09:59:36 UTC

Threads are much much slower than async/await.

OtomotO

0 replies

11h31m

2024-03-25 06:55:14 UTC

I would damn this, if Async/Await wasn't a good enough (TM) solution for certain problems where Threads are NOT good enough.

Remember: there is a reason why Async/Await was created B E F O R E JavaScript was used for more than sprinkling a few fancy effects on some otherwise static webpages

Karrot_Kream

0 replies

10h29m

2024-03-25 07:56:50 UTC

async/await is just a different concurrency paradigm with different strengths and weaknesses than threads. Rust has support for threaded concurrency as well though the ecosystem for it is a lot less mature.

graphenus

13 replies

12h12m

2024-03-25 06:14:26 UTC

Async/await just like threads is a concurrency mechanism and also always requires locks when accessing the shared memory. Where does your statement come from?

romanovcode

8 replies

12h6m

2024-03-25 06:20:28 UTC

Where does your statement come from?

This is how async/await works in Node (which is single-threaded) so most developers think this is how it works in every technology.

bheadmaster

7 replies

11h44m

2024-03-25 06:42:28 UTC

Even in Node, if you perform asynchronous operations on a shared resource, you need synchronization mechanisms to prevent interleaving of async functions.

There has been more than one occasion when I "fixed" a system in NodeJS just by wrapping some complex async function up in a mutex.

nurple

6 replies

10h53m

2024-03-25 07:32:56 UTC

This lacks quite a bit of nuance. In node you are guaranteed that synchronous code between two awaits will run to completion before another task(that could access your state) from the event loop gets a turn; with multi-threaded concurrency you could be preempted between any two machine instructions. So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

What you usually see with JS for concurrency of shared IO resources in practice is that they are "owned" by the closure of a flow of async execution and rarely available to other flows. This architecture often obviates the need to lock on the shared resource at all as the natural serialization orchestrated by the string of state machines already naturally accomplishes this. This pattern was even quite common in the CPS style before async/await.

For example, one of the first things an app needs do before talking to a DB is to get a connection which is often retrieved by pulling from a pool; acquiring the reservation requires no lock, and by virtue of the connection being exclusively closed over in the async query code, it also needs no locking. When the query is done, the connection can be replaced to the pool sans locking.

The place where I found synchronization most useful was in acquiring resources that are unavailable. Interestingly, an async flow waiting on a signal for a shared resource resembles a channel in golang in how it shifts the state and execution to the other flow when a pooled resource is available.

All this to say, yeah I'm one of the huge fans of node that finds rust's take on default concurrency painfully over complicated. I really wish there was an event-loop async/await that was able to eschew most of the sync, send, lifetime insanity. While I am very comfortable with locks-required multithreaded concurrency as well, I honestly find little use for it and would much prefer to scale by process than thread to preserve the simplicity of single-threaded IO-bound concurrency.

bheadmaster

4 replies

9h20m

2024-03-25 09:05:36 UTC

So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory

Yes, in Node you don't get the usual data races like in C++, but data-structure races can be just as dangerous. E.g. modifying the same array/object from two interleaved async functions was a common source of bugs in the systems I've referred to.

Of course, you can always rely on your code being synchronous and thus not needing a lock, but if you're doing anything asynchronous and you want a guarantee that your data will not be mutated from another async function, you need a lock, just like in ordinary threads.

One thing I deeply dislike about Node is how it convinces programmers that async/await is special, different from threading, and doesn't need any synchronisation mechanisms because of some Node-specific implementation details. This is fundamentally wrong and teaches wrong practices when it comes to concurrency.

nurple

3 replies

2h28m

2024-03-25 15:58:07 UTC

But single-threaded async/await _is_ special and different from multi-threaded concurrency. Placing it in the same basket and prescribing the same method of use is fundamentally wrong and fails to teach the magic of idiomatic lock free async javascript.

I'm honestly having a difficult time creating a steel man js sample that exhibits data races unless I write weird C-like constructs and ignore closures and async flows to pass and mutate multi-element variables by reference deep into the call stack. This just isn't how js is written.

When you think about async/await in terms of shepherding data flows it becomes pretty easy to do lock free async/await with guaranteed serialization sans locks.

bheadmaster

2 replies

2h3m

2024-03-25 16:23:00 UTC

I'm honestly having a difficult time creating a steel man js sample that exhibits data races

I can give you a real-life example I've encountered:

    const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds

    let cache = {}; // Shared cache object

    function getFromCache(key) {
      const cachedData = cache[key];
      if (cachedData && Date.now() - cachedData.timestamp < CACHE_EXPIRY) {
        return cachedData.data;
      }
      return null; // Cache entry expired or not found
    }

    function updateCache(key, data) {
      cache[key] = {
        data,
        timestamp: Date.now(),
      };
    }

    var mockFetchCount = 0;

    // simulate web request shorter than cache time
    async function mockFetch(url) {
      await new Promise(resolve => setTimeout(resolve, 100));
      mockFetchCount += 1;
      return `result from ${url}`;
    }

    async function fetchDataAndUpdateCache(key) {
      const cachedData = getFromCache(key);
      if (cachedData) {
        return cachedData;
      }

      // Simulate fetching data from an external source
      const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

      updateCache(key, newData);
      return newData;
    }

    // Race condition:
    (async () => {
      const key = 'myData';

      // Fetch data twice in a sequence - OK
      await fetchDataAndUpdateCache(key);
      await fetchDataAndUpdateCache(key);
      console.log('mockFetchCount should be 1:', mockFetchCount);

      // Reset counter and wait cache expiry
      mockFetchCount = 0;
      await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));

      // Fetch data twice concurrently - we executed fetch twice!
      await Promise.all([fetchDataAndUpdateCache(key), fetchDataAndUpdateCache(key)]);
      console.log('mockFetchCount should be 1:', mockFetchCount);
    })();

This is what happens when you convince programmers that concurrency is not a problem in JavaScript. Even though this cache works for sequential fetching and will pass trivial testing, as soon as you have concurrent fetching, the program will execute multiple fetches in parallel. If server implements some rate-limiting, or is simply not capable of handling too many parallel connections, you're going to have a really bad time.

Now, out of curiosity, how would you implement this kind of cache in idiomatic, lock-free javascript?

wolfgang42

0 replies

42m

2024-03-25 17:44:13 UTC

> how would you implement this kind of cache in idiomatic, lock-free javascript?

The simplest way is to cache the Promise<data> instead of waiting until you have the data:

    -async function fetchDataAndUpdateCache(key: string) {
    +function fetchDataAndUpdateCache(key: string) {
       const cachedData = getFromCache(key);
       if (cachedData) {
         return cachedData;
       }

       // Simulate fetching data from an external source
     -const newData = await mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
     +const newData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch

       updateCache(key, newData);
       return newData;
     }

From this the correct behavior flows naturally; the API of fetchDataAndUpdateCache() is exactly the same (it still returns a Promise<result>), but it’s not itself async so you can tell at a glance that its internal operation is atomic. (This does mildly change the behavior in that the expiry is now from the start of the request instead of the end; if this is critical to you you can put some code in `updateCache()` like `data.then(() => cache[key].timestamp = Date.now()).catch(() => delete cache[key])` or whatever the exact behavior you want is.)

I‘m not even sure what it would mean to “add a lock” to this code; I guess you could add another map of promises that you’ll resolve when the data is fetched and await on those before updating the cache, but unless you’re really exposing the guts of the cache to your callers that’d achieve exactly the same effect but with a lot more code.

nurple

0 replies

58m

2024-03-25 17:28:13 UTC

Don't forget to fung futures that are fungible for the same key.

ETA: I appreciate the time you took to make the example, also I changed the extension to `mjs` so the async IIFE isn't needed.

  const CACHE_EXPIRY = 1000; // Cache expiry time in milliseconds
  
  let cache = {}; // Shared cache object
  let futurecache = {}; // Shared cache of future values
  
  function getFromCache(key) {
    const cachedData = cache[key];
    if (cachedData && Date.now() - cachedData.timestamp < CACHE_EXPIRY) {
      return cachedData.data;
    }
    return null; // Cache entry expired or not found
  }
  
  function updateCache(key, data) {
    cache[key] = {
      data,
      timestamp: Date.now(),
    };
  }
  
  var mockFetchCount = 0;
  
  // simulate web request shorter than cache time
  async function mockFetch(url) {
    await new Promise(resolve => setTimeout(resolve, 100));
    mockFetchCount += 1;
    return `result from ${url}`;
  }
  
  async function fetchDataAndUpdateCache(key) {
    // maybe its value is cached already
    const cachedData = getFromCache(key);
    if (cachedData) {
      return cachedData;
    }
  
    // maybe its value is already being fetched
    const future = futurecache[key];
    if(future) {
      return future;
    }
  
    // Simulate fetching data from an external source
    const futureData = mockFetch(`https://example.com/data/${key}`); // Placeholder fetch
    futurecache[key] = futureData;
  
    const newData = await futureData;
    delete futurecache[key];
  
    updateCache(key, newData);
    return newData;
  }
  
  const key = 'myData';
  
  // Fetch data twice in a sequence - OK
  await fetchDataAndUpdateCache(key);
  await fetchDataAndUpdateCache(key);
  console.log('mockFetchCount should be 1:', mockFetchCount);
  
  // Reset counter and wait cache expiry
  mockFetchCount = 0;
  await new Promise(resolve => setTimeout(resolve, CACHE_EXPIRY));
  
  // Fetch data twice concurrently - we executed fetch twice!
  await Promise.all([...Array(100)].map(() => fetchDataAndUpdateCache(key)));
  console.log('mockFetchCount should be 1:', mockFetchCount);

mike_hearn

0 replies

8h12m

2024-03-25 10:14:07 UTC

> So while you _do_ have to serialize access to shared IO resources, you do _not_ have to serialize access to memory(just add the connection to the hashset, no locks).

No, this can still be required. Nothing stops a developer setting up a partially completed data structure and then suspending in the middle, allowing arbitrary re-entrancy that will then see the half-finished change exposed in the heap.

This sort of bug is especially nasty exactly because developers often think it can't happen and don't plan ahead for it. Then one day someone comes along and decides they need to do an async call in the middle of code that was previously entirely synchronous, adds it and suddenly you've lost data integrity guarantees without realizing it. Race conditions appear and devs don't understand it because they've been taught that it can't happen if you don't have threads!

conradludgate

3 replies

11h38m

2024-03-25 06:48:23 UTC

If you perform single threaded async in Rust, you can drop down to the cheap single threaded RefCell rather than the expensive multithreaded Mutex/RwLock

cogman10

2 replies

7h6m

2024-03-25 11:20:12 UTC

That's one example of a lock you might eliminate, but there are plenty of other cases where it's impossible to eliminate locks even while single threaded.

Consider, for example, something like this (not real rust, I'm rusty there)

    lock {
      a = foo();
      b = io(a).await;
      c = bar(b);
    }

Eliminating this lock is unsafe because a, b, and c are expected to be updated in tandem. If you remove the lock, then by the time you reach c, a and b may have changed under your feet in an unexpected way because of that await.

josephg

1 replies

6h45m

2024-03-25 11:41:07 UTC

Yeah but this problem goes away entirely if you just don’t await within a critical region like that.

I’ve been using nodejs for a decade or so now. Nodejs can also suffer from exactly this problem. In all that time, I think I’ve only reached for a JS locking primitive once.

cogman10

0 replies

3h14m

2024-03-25 15:12:10 UTC

There is no problem here with the critical region. The problem would be removing the critical region because "there's just one thread".

This is incorrect code

      a = foo();
      b = io(a).await;
      c = bar(b);

Without the lock, `a` can mutate before `b` is done executing which can mess with whether or not `c` is correct. The problem is if you have 2 independent variables that need to be updated in tandem.

Where this might show up. Imagine you have 2 elements on the screen, a span which indicates the contents and a div with the contents.

If your code looks like this

    mySpan.innerText = "Loading ${foo}";
    myDiv.innerText = load(foo).await;
    mySpan.innerText = "";

You now have incorrect code if 2 concurrent loads happen. It could be the original foo, it could be a second foo. There's no way to correctly determine what the content of `myDiv` is from an end user perspective as it depends entirely on what finished last and when. You don't even know if loading is still happening.

junon

2 replies

9h27m

2024-03-25 08:58:59 UTC

async/await runs in context of one thread,

Not in Rust.

pohl

1 replies

7h4m

2024-03-25 11:22:23 UTC

There is a single thread executor crate you can use for that case if it’s what you desire, FWIW.

junon

0 replies

6h58m

2024-03-25 11:28:10 UTC

Yes of course, but the async/await semantics are not designed only to be single threaded. Typically promises can be resumed on any executor thread, and the language is designed to reflect that.

dehrmann

1 replies

11h27m

2024-03-25 06:58:40 UTC

async can be scarier for locks since a block of code might depend on having exclusive access, and since there wasn't an await, it got it. Once you add an await in the middle, the code breaks. Threading at least makes you codify what actually needs exclusive access.

async also signs you up for managing your own thread scheduling. If you have a lot of IO and short CPU-bound code, this can be OK. If you have (or occasionally have) CPU-bound code, you'll find yourself playing scheduler.

cageface

0 replies

11h13m

2024-03-25 07:13:14 UTC

Yeah once your app gets to be sufficiently complex you will find yourself needing mutexes after all. Async/await makes the easy parts of concurrency easy but the hard parts are still hard.

specialist

0 replies

3h43m

2024-03-25 14:43:01 UTC

backpressure should also be mentioned

I ran into this when I joined a team using nodejs. Misc services would just ABEND. Coming from Java, I was surprised by this oversight. It was tough explaining my fix to the team. (They had other great skills, which I didn't have.)

error propagation in async/await is not obvious

I'll never use async/await by choice. Solo project, ...maybe. But working with others, using libraries, trying to get everyone on the same page? No way.

I haven't used (language level) structured concurrency in anger yet, but I'm placing my bets on Java's Loom Project. Best as I can tell, it'll moot the debate.

newpavlov

40 replies

12h38m

2024-03-25 05:48:08 UTC

I think a better question is "why choose async/await over fibers?". Yes, I know that Rust had green threads in the pre-1.0 days and it was intentionally removed, but there are different approaches for implementing fiber-based concurrency, including those which do not require a fat runtime built-in into the language.

If I understand the article correctly, it mostly lauds the ability to drop futures at any moment. Yes, you can not do a similar thing with threads for obvious reasons (well, technically, you can, but it's extremely unsafe). But this ability comes at a HUGE cost. Not only you can not use stack-based arrays with completion-based executors like io-uring and execute sub-tasks on different executor threads, but it also introduces certain subtle footguns and reliability issues (e.g. see [0]), which become very unpleasant surprises after writing sync Rust.

My opinion is that cancellation of tasks fundamentally should be cooperative and uncooperative cancellation is more of a misfeature, which is convenient at the surface level, but has deep issues underneath.

Also, praising composability of async/await sounds... strange. Its viral nature makes it anything but composable (with the current version of Rust without a proper effect system). For example, try to use async closure with map methods from std. What about using the standard io::Read/Write traits?

[0]: https://smallcultfollowing.com/babysteps/blog/2022/06/13/asy...

logicchains

10 replies

10h49m

2024-03-25 07:37:20 UTC

The sole advantage of async/await over fibers is the possibility to achieve ultra-low-latency via the compiler converting the async/await into a state machine. This is important for Rust as a systems language, but if you don't need ultra-low-latency then something with a CSP model built on fibers, like Goroutines or the new Java coroutines, is much easier to reason about.

newpavlov

6 replies

10h43m

2024-03-25 07:43:24 UTC

Fibers and async/await are backed by the same OS APIs, they can achieve more or less the same latency. The main advantage of async/await (or to be more precise stackless coroutines) is that they require less memory for task stacks since tasks can reuse executor's stack for non-persistent part of their stack (i.e. stack variables which do not cross yield points). It has very little to do with latency. At most you can argue that executor's stack stays in CPU cache, which reduces amount of cache misses a bit.

Stackless coroutines also make it easier to use parent's stack for its children stacks. But IMO it's only because compilers currently do not have tools to communicate maximum stack usage bound of functions to programming languages.

nominatronic

2 replies

9h31m

2024-03-25 08:54:43 UTC

Fibers and async/await are backed by the same OS APIs

async/await doesn't require any OS APIs, or even an OS at all.

You can write async rust that runs on a microcontroller and poll a future directly from an interrupt handler.

And there's a huge advantage to doing so, too: you can write out sequences of operations in a straightforward procedural form, and let the compiler do the work of turning that into a state machine with a minimal state representation, rather than doing that manually.

newpavlov

1 replies

9h27m

2024-03-25 08:59:10 UTC

Sigh... It gets tiring to hear about embedded from async/await advocates as if it's a unique advantage of the model. Fibers and similar mechanisms are used routinely in embedded world as demonstrated by various RTOSes.

Fibers are built on yielding execution to someone else, which is implemented trivially on embedded targets. Arguably, in a certain sense, fibers can be even better suited for embedded since they allow preemption of task by interrupts at any moment with interrupts being processed by another task, while with async/await you have to put event into queue and continue execution of previously executed future.

gpderetta

0 replies

7h17m

2024-03-25 11:09:25 UTC

Exactly. Proof by implementation: asio is a very well known and well regarded C++ event loop library and can be transparently used with old-school hand-written continuations, more modern future/promise, language based async/await coroutines and stackful coroutines (of the boost variety).

The event loop and io libraries are in practice the same for any solution you decide, everything else is just sugar on top and in principle you can mix and match as needed.

logicchains

1 replies

9h47m

2024-03-25 08:39:32 UTC

Fibers and async/await are backed by the same OS APIs, they can achieve more or less the same latency.

The key requirement for ultra-low-latency software is minimising/eliminating dynamic memory allocation, and stackless coroutines allow avoiding memory allocation. For managed coroutines on the other hand (e.g. goroutines, Java coroutines) as far as I'm aware it's impossible to have an implementation that doesn't do any dynamic memory allocation, or at least there aren't any such implementations in practice.

newpavlov

0 replies

9h36m

2024-03-25 08:49:57 UTC

Yes, it's what I wrote about in the last paragraph. If you can compute maximum stack size of a function, then you can avoid dynamic allocation with fibers as well (you also could provide stack size manually, but it would break horribly if the provided number is wrong). You are right that such implementations do not exist right now, but I think it's technically feasible, as demonstrated by tools such as https://github.com/japaric/cargo-call-stack The main stumbling block here is FFI, historically shared libraries do not have any annotations about stack usage, so functions with bounded stack usage would not be able to use even libc.

zozbot234

0 replies

1h43m

2024-03-25 16:42:34 UTC

communicate maximum stack usage bound of functions

This would be useful in all sorts of deeply embedded code (as well as more exotic things, such as coding for GPU compute). Unfortunately it turns out to be unfeasible when dealing with true reentrant functions (e.g. any kind of recursion) or any use of FFI, dynamic dispatch etc. etc. So it can only really be accomplished when dealing with near 'leaf' code, where stack usage is expected to be negligible anyway.

pron

2 replies

8h55m

2024-03-25 09:30:45 UTC

The key requirement for ultra-low-latency software is minimising/eliminating dynamic memory allocation

First, that is only true in languages -- like C++ or Rust -- where dynamic memory allocation (and deallocation) is relatively costly. In a language like Java, the cost of heap allocation is comparable to stack allocation (it's a pointer bump).

Second, in the most common case of writing high throughput servers, the performance comes from Little's law and depends on having a large number of threads/coroutines. That means that all the data required for the concurrent tasks cannot fit in the CPU cache, and so switching in a task incurs a cache-miss, and so cannot be too low-latency.

The only use-cases where avoiding memory allocation could be useful and achieving very low latency is possible are when the number of threads/coroutines is very small, e.g. generators.

The questions, then, are which use-case you pick to guide the design, servers or generators, and what the costs of memory management are in your language.

logicchains

1 replies

7h50m

2024-03-25 10:35:52 UTC

Second, in the most common case of writing high throughput servers,

High-throughput servers are not ultra-low-latency software; they prioritise throughput over latency. Ultra-low-latency software is stuff like audio processing, microcontrollers and HFT. There's a trade-off between throughput and latency.

pron

0 replies

3h30m

2024-03-25 14:55:49 UTC

Not here. You don't trade off latency because you cannot reduce it below a cache-miss per context-switch, anyway if your working set is not tiny. The point is that if you have lots of tasks then your latency is has a lower bound (due to hardware limitations) regardless of the design.

In other words, if your server serves some amount of data that is larger than the CPU cache size and can be accessed at random, there is some latency that you have to pay, and so many micro-optimisations are simply ineffective even if you want to get the lowest latency possible. Incurring a cache miss and allocating memory (if your allocation is really fast) and even copying some data around isn't significantly slower than just incurring a cache miss and not doing those other things. They matter only when you don't incur a cache miss, and that happens when you have a very small number of tasks whose data fits in the cache (i.e. a generator use-case and not so much a server use-case).

Put in yet another way, some considerations only matter when the workload doesn't involve many cache misses, but a server workload virtually always incurs a cache-miss when serving a new request, even in servers that care mostly about latency. In general, in servers you're then working in the microsecond range, anyway, and so optimisations that operate at the nanosecond range are not useful.

dwaite

7 replies

10h39m

2024-03-25 07:46:35 UTC

For rust, fibers (as a user-space, cooperative concurrency abstraction) would mandate a lot of design choices, such as whether stacks should be implemented using spaghetti stacks or require some sort of process-level memory mapping library, or even if they were just limited to a fixed size stack.

All three of these approaches would cause issues when interacting with code in another language with a different ABI. It can get really complicated, for example, when C code gets called from one fiber and wants to then resume another.

One of the benefits of async/await is the 'await' keyword itself. The explicit wait-points give you the ability to actually reason about the interactions of a concurrent program.

Yielding fibers are a bit like the 'goto' of the concurrency world - whenever you call a method, you don't know if as a side effect it may cause your processing to pause, and if when it continues the state of the world has changed. The need to be defensive when interfacing with the outside world means fibers tend to be better for tasks which run in isolation and communicate by completion.

Green threads, fibers and coroutines all share the same set of problems here, but really user space cooperative concurrency is just shuffling papers on a desk in terms of solving the hard parts of concurrency. Rust async/await leaves things more explicit, but as a result doesn't hide certain side effects other mechanisms do.

newpavlov

4 replies

10h22m

2024-03-25 08:03:44 UTC

In my opinion, by default fibers should use "full" stacks, i.e. a reasonable amount of unpopulated memory pages (e.g. 2 MiB) with guard page. Effectively, the same stack which we use for threads. It should eliminate all issues about interfacing with external code. But it obviously has performance implications, especially for very small tasks.

Further, on top of this we can then develop spawning tasks which would use parent's stack. It would require certain language development to allow computing maximum stack usage bound of functions. Obviously, such computation would mean that programmers have to take additional restrictions on their code (such as disallowing recursion, alloca, and calling external functions without attributing stack usage), but compilers already routinely compute stack usage of functions, so for pure Rust code it should be doable.

It can get really complicated, for example, when C code gets called from one fiber and wants to then resume another.

It's a weird example. How would a C library know about fiber runtime used in Rust?

Yielding fibers are a bit like the 'goto' of the concurrency world - whenever you call a method, you don't know if as a side effect it may cause your processing to pause, and if when it continues the state of the world has changed.

I find this argument funny. Why don't you have the same issue with preemptive multitasking? We live with exactly this "issue" in the threading world just fine. Even worse, we can not even rely on "critical sections", thread's execution can be preempted at ANY moment.

As for `await` keyword, in almost all cases I find it nothing more than a visual noise. It does not provide any practically useful information for programmer. How often did you wonder when writing threading-based code about whether function does any IO or not?

vlovich123

3 replies

9h28m

2024-03-25 08:58:09 UTC

It’s a weird example. How would a C library know about the fiber runtime used in Rust.

Well, if you green thread were launched on an arbitrary free OS thread (work stealing), then for example your TLS variables would be very wrong when you resume execution. Does it break all FFI? No. But it can cause issues for some FFI in a way that async/await cannot.

I find this argument funny. Why don't you have the same issue with preemptive multitasking? We live with exactly this "issue" in the threading world just fine. Even worse, we can not even rely on "critical sections", thread's execution can be preempted at ANY moment.

It’s not about critical sections as much. Since the author referenced go to, I think the point is that it gets harder to reason about control flow within your own code. Whether or not that’s true is debatable since there’s not really any implementation of green threads for Rust. It does seem to work well enough for Go but it has a required dedicated keyword to create that green thread to ease readability.

As for `await` keyword, in almost all cases I find it nothing more than a visual noise. It does not provide any practically useful information for programmer. How often did you wonder when writing threading-based code about whether function does any IO or not?

Agree to disagree. It provides very clear demarcation of which lines are possible suspension points which is important when trying to figure out where “non interruptible” operations need to be written for things to work as intended.

immibis

2 replies

8h35m

2024-03-25 09:51:21 UTC

Obviously you would not use operating system TLS variables when your code does not correspond the operating system threads.

They're just globals, anyway - why are we on Hacker News discussing the best kind of globals? Avoid them and things will go better.

vlovich123

0 replies

3h4m

2024-03-25 15:22:13 UTC

Not sure why you’re starting a totally unrelated debate. If you’re pulling in a library via FFI, you have no control over what that library has done. You’d have to audit the source code to figure out if they’ve done anything that would be incompatible with fibers. And TLS is but one example. You’d have to audit for all kinds of OS thread usage (e.g. if it uses the current thread ID as an index into a hashmap or something). It may not be common, but Go’s experience is that there’s some issue and the external ecosystem isn’t going to bend itself over backwards to support fibers. And that’s assuming that these are solved problems within your own language ecosystem which may not be the case either when you’re supporting multiple paradigms.

surajrmal

0 replies

5h5m

2024-03-25 13:21:18 UTC

You've obviously never worked with fibers if you think these are obvious. These problems are well documented and observed empirically in the field.

pron

0 replies

8h44m

2024-03-25 09:42:23 UTC

you don't know if as a side effect it may cause your processing to pause, and if when it continues the state of the world has changed

This may be true in JS (or Haskell), but not in Rust, where you already have multithreading (and unrestricted side-effects), and so other code may always be interleaved. So this argument is irrelevant in languages that offer both async/await and threads.

Furthermore, the argument is weak to begin with because the difference is merely in the default choice. With threads, the default is that interleaving may happen anywhere unless excluded, while with async/await it's the other way around. The threading approach is more composable, maintainable, and safer, because the important property is that of non -interference, which threads state explicitly. Any subroutine specifies where it does not tolerate contention regardless of other subroutine it calls. In the async/await model, adding a yield point to a previously non-yielding subroutine requires examining the assumptions of its callers. Threads' default -- of requiring an explicit statement of the desired property of interference is the better one.

gpderetta

0 replies

7h36m

2024-03-25 10:50:13 UTC

I never fully understood the FFI issue. When calling an not-known coroutine safe FFI function you would switch form the coroutine stack to the original thread stack and back. This need not be more expensive than a couple of instructions on the way in and out.

Interestingly, reenabling frame pointers was in the news recently, which would add a similar amount of overhead to every function call. That was considered a more than acceptable tradeoff.

zozbot234

5 replies

8h11m

2024-03-25 10:14:44 UTC

Because stackful fibers suck for low-level code. See Gor Nishanov's review for the C++ committee http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2018/p136... (linked from https://devblogs.microsoft.com/oldnewthing/20191011-00/?p=10... ). It even sums things up nicely: DO NOT USE FIBERS!

gpderetta

2 replies

7h54m

2024-03-25 10:31:39 UTC

To be clear, not everybody agrees with Gor and although they still don't have much traction, stackful coroutines are still being proposed.

Most (but admittedly not all) of Gor issues with stackful coroutines are due to current implementations being purely a library feature with no compiler support. Most of the issues (context switch overhead, tread local issues) can be solved with compiler support.

The issue with lack of support in some existing libraries is unfair, neither do async/await and in fact it would be significantly harder to add support for those as they would require a full rewrite. The take away here is not to try to make M:N fully transparent, not avoid it completely.

The issue with stack usage is real though, split stacks is only a panacea and OS support would be required.

edit: for the other side of the coin, also from microsoft, about issues with the async/await model in Midori (an high performance single-address-space OS): https://joeduffyblog.com/2015/11/19/asynchronous-everything/

They ended up implementing the equivalent of stackful coroutines specifically for performance.

zozbot234

1 replies

6h58m

2024-03-25 11:27:52 UTC

The take away here is not to try to make M:N fully transparent

But the whole point of the 'Virtual processors' or 'Light-weight process' patterns as popularized by Golang and such is to try and make M:N transparent to user code.

gpderetta

0 replies

6h42m

2024-03-25 11:43:55 UTC

I mean fully transparent to pre-existing code. Go had the benefit of having stackful coroutines from day one, but when retrofitting to an existing language, expecting the whole library ecosystem to work out of the box without changes is a very high bar.

moring

1 replies

7h18m

2024-03-25 11:08:23 UTC

That paper has some significant flaws:

In section 2.3.1, "Dangers of N:M model", the "danger" lies in using thread-local storage that was built for OS threads, without modification, for stackful fibers. The bottom line here should have been "don't do that", not that stackful fibers are dangerous. Obviously, any library that interacts with the implementation mechanism for concurrency must be built for the mechanism actually used, not a different one.

In section 2.3.2, "Hazards of 1:N model", again points out the dangers of making such blind assumptions, but its main point is that blocking APIs must be blocking on a fiber level, not on the host thread level -- that is, it even proposes the solution for the problem mentioned, then totally ignores that solution. Java's approach (even though, IIRC, N:M) does exactly that: "rewrite it in Java". This is one place where I hope "rewrite it in Rust" becomes more than a meme in the future and actually becomes a foundation for stackful fibers.

Then there are a bunch of case studies that show that you cannot just sprinkle some stackful fibers over an existing codebase and hope for it to work. Surprise: You can't do that with async/await either. "What color is my function" etc.

I'm still hoping for a complete, clearly presented argument for why Java can do it and Rust cannot. (Just for example: "it needs a GC" could be the heart of such an argument).

zozbot234

0 replies

6h55m

2024-03-25 11:30:56 UTC

Just for example: "it needs a GC" could be the heart of such an argument

Rust can actually support high-performance concurrent GC, see https://github.com/chc4/samsara for an experimental implementation. But unlike other languages it gives you the option of not using it.

ngrilly

4 replies

10h19m

2024-03-25 08:07:14 UTC

Could you share an example of a fiber implementation not relying a fat runtime built in the language?

anonymoushn

2 replies

8h45m

2024-03-25 09:40:43 UTC

For example https://github.com/creationix/libco

The required thing is mostly just to dump your registers on the stack and jump.

noselasd

1 replies

8h4m

2024-03-25 10:21:59 UTC

I do think implementations like that are not particularly useful though.

You want a runtime to handle and multiplex blocking calls - otherwise if you perform any blocking calls (mostly I/O) in one fiber, you block everything - so what use are those fibers ?

anonymoushn

0 replies

7h47m

2024-03-25 10:39:18 UTC

The answer is the same as in async Rust, right? "Don't do that."

If you wanted to use this for managing a bunch of I/O contexts per OS thread then you would need to bring an event loop and a bunch of functions that hook up your event loop to whatever asynchronous I/O facilities your OS provides. Sort of like in async Rust.

newpavlov

0 replies

10h11m

2024-03-25 08:14:41 UTC

https://github.com/Xudong-Huang/may

The project has some serious restrictions and unsound footguns (e.g. around TLS), but otherwise it's usable enough. There are also a number of C/C++ libraries, but I can not comment on those.

lmm

4 replies

11h55m

2024-03-25 06:30:35 UTC

How do fibers solve your cancellation problem? Aren't they more or less equivalent?

(I find fiber-based code hard to follow because you're effectively forced to reason operationally. Keeping track of in-progress threads in your head is much harder than keeping track of to-be-completed values, at least for me)

newpavlov

2 replies

11h39m

2024-03-25 06:47:11 UTC

With fibers you send cancellation signal to a task, then on next IO operation (or more generally yield) it will get cancellation error code with ability to get true result of IO operation, if there is any. Note that it does not mean that the task will sleep until the IO operation gets completed, cancellation signal causes any ongoing IO to "complete" immediately if it's possible (e.g. IIRC disk IO can not be cancelled).

It then becomes responsibility of the task to handle this signal. It may either finish immediately (e.g. by bubbling the "cancellation" error), finish some critical section before that and do some cleanup IO, or it may even outright ignore the signal.

With futures you just drop the task's future (i.e. its persistent stack) maybe with some synchronous cleanup and that's it, you don't give the task a chance to say a word in its cancellation. Hypothetical async Drop could help here (though you would have to rely on async drop guards extensively instead of processing "cancellation errors"), but adding it to Rust is far from easy and AFAIK there are certain fundamental issues with it.

With io-uring sending cancellation signals is quite straightforward (though you need to account for different possibilities, such as task being currently executed on a separate executor thread, or its CQE being already in completion queue), but with epoll, unfortunately, it's... less pleasant.

lmm

1 replies

11h20m

2024-03-25 07:05:46 UTC

Hypothetical async Drop could help here (though you would have to rely on async drop guards extensively instead of processing "cancellation errors"), but adding it to Rust is far from easy and AFAIK there are certain fundamental issues with it.

Wouldn't fiber cancellation be equivalent and have equivalent implementation difficulties? You say you just send a signal to the task, but in practice picking up and running the task to trigger its cancellation error handling is going to look the same as running a future's async drop, isn't it?

newpavlov

0 replies

11h9m

2024-03-25 07:17:02 UTC

Firstly, Rust does not have async Drop and it's unlikely to be added in the foreseeable future. Secondly, cancellation signals is a more general technique than async Drop, i.e. you can implement the latter on top of the former, but not the other way around. For example, with async Drop you can not ignore cancellation event (unless you copy code of your whole task into Drop impl). Some may say that it's a good thing, but it's just an obvious example of cancellation signals being more powerful than hypothetical async Drop.

As for implementation difficulties, I don't think so. For async Drop you need to mess with some fundamental parts of the Rust language (since Futures are "just types"), while fiber-based concurrency, in a certain sense, is transparent for compiler and implementation complexity is moved to executors.

If you are asking about how it would look in user code, then, yes, they would be somewhat similar. With cancellation signals you would call something like `let res = task_handle.cancell_join();`, while with async Drop you would use `drop(task_future)`. Note that the former also allows to get result from a cancelled task, another example of greater flexibility.

bheadmaster

0 replies

11h39m

2024-03-25 06:47:07 UTC

Keeping track of in-progress threads in your head is much harder than keeping track of to-be-completed values, at least for me

I think that's true for everybody. Our minds barely handle state for sequential code - the explosion of complexity of multiple state-modifying threads is almost impossible to follow.

There are ways to convert "keeping track of in-progress threads" to "keeping track or to-be-completed" values - in particular, Go uses channels as a communication mechanism which explicitly does the latter, while abstracting away the former.

lloeki

3 replies

11h52m

2024-03-25 06:34:11 UTC

I think a better question is "why choose async/await over fibers?

there are different approaches for implementing fiber-based concurrency, including those which do not require a fat runtime built-in into the language

This keynote below is Ruby so the thread/GVL situation is different than for Rust, but is that the kind of thing you mean?

https://m.youtube.com/watch?v=qKQcUDEo-ZI

I think it makes a good case that async/await is infectious and awkward, and fibers (at least as implemented in Ruby) is quite simply a better paradigm.

dwaite

1 replies

10h17m

2024-03-25 08:09:14 UTC

I think it makes a good case that async/await is infectious and awkward, and fibers (at least as implemented in Ruby) is quite simply a better paradigm.

As someone who has done fibers development in Ruby, I disagree.

CRuby has the disadvantage of a global interpreter lock. This means that parallelism can only be achieved via multiple processes. This is not the case in Rust, where have access to do true parallelism in a single process.

Second, this talk is not arguing for use of fibers as much as it is arguing as using fibers to rig up a bespoke green threads-like system for a specific web application server, and advocating for ruby runtime features to make the code burden on them of doing this lighter.

Ruby has a global interpreter lock, so even though it uses native threads only one of them can be executing ruby code at a time. Fibers have native stacks, so they have all the resource requirements of a thread sans context switching - but the limitations from the GIL actually mean you aren't _saving_ context switching by structuring your code to use fibers in typical (non "hello world" web server) usage.

lloeki

0 replies

7h12m

2024-03-25 11:14:15 UTC

I'm sorry but you seem to have missed the point I'm referring to, even though you have directly quoted it.

None of your paragraphs are relevant to async await vs fiber: async await requires you to put keywords in all sorts of unexpected places, fibers do not.

CRuby has the disadvantage of a global interpreter lock. This means that parallelism can only be achieved via multiple processes.

I am very well cognisant of this fact, but this bears absolutely no relationship with async await vs fibers: one is sweeping callbacks on an event loop under leaky syntactic sugar, the other is cooperative coroutines; they're all on the same Ruby thread, the GVL simply does not intervene.

this talk is not arguing for use of fibers as much as it is arguing as using fibers to rig up a bespoke green threads-like system for a specific web application server, and advocating for ruby runtime features to make the code burden on them of doing this lighter.

That sounds like a very cynical view of it. I believe there are arguments made in this talk that are entirely orthogonal to how Falcon benefits from fibers.

Fibers have native stacks, so they have all the resource requirements of a thread

For starters fibers don't have the resource requirements of a thread because they're not backed by native OS threads, while CRuby creates an OS thread for each Ruby VM thread (until MaNy lands, but even with M:N they'd still be subject to the GVL).

Are you arguing for a stackless design like Stackless Python? Goroutines are stack-based too and the benefit of coroutines (and a M:N design) is very apparent. Anyway async await is stack-based too so I don't see how this is relevant: if you have 1M requests served at the same time truly concurrently you're going to have either 1M event callbacks or 1M fibers; that's 1 million stacks either way.

I seem to gather through what I read as a bitter tone that your experience with fibers was not that good. I appreciate another data point but I can't seem to reconcile it vs async await.

But really, the only reason I brought up threads is because the GVL makes threads less useful in CRuby than in Rust (or Go since I mentioned it).

newpavlov

0 replies

11h47m

2024-03-25 06:39:22 UTC

I only scrolled the video, but it sounds similar, yes. Though, implementation details would probably vary significantly, since Rust is a lower-level language.

lamontcg

0 replies

45m

2024-03-25 17:41:11 UTC

My opinion is that cancellation of tasks fundamentally should be cooperative and uncooperative cancellation is more of a misfeature, which is convenient at the surface level, but has deep issues underneath.

Cooperative cancellation can be pretty annoying with mathematical problems. You can have optimization algorithms calling rootfinding problems calling ODE integrators and any of those can spin for a very long time and you need be threading through cancellation tokens everywhere and the numerical framework generally don't support it. You can and should use iteration counts in all the algorithms but once you're dealing with nested algorithms that can only guarantee that your problem will stop sometime this year, not that it stops within 5 seconds. With these problems I can promise that I'm just doing: lots of math, allocations with their associated page faults, no I/O, writing strings to a standard library Queue object for logging that are handled back in the Main thread that I'm never going to cancel (and whatever other features you think I might need -- which I haven't needed for years now -- I'd be happy to ship that information back to the main thread on a Queue). It feels like that problem should be solvable in the 21st century to me without making me thread cancellation tokens everywhere and defensively code against spinning without checking the token (where I can make mistakes and cause bugs, which I guess you'll just blame me for).

Animats

39 replies

10h57m

2024-03-25 07:29:26 UTC

Async/await with one thread is simple and well-understood. That's the Javascript model. Threads let you get all those CPUs working on the problem, and Rust helps you manage the locking. Plus, you can have threads at different priorities, which may be necessary if you're compute-bound.

Multi-threaded async/await gets ugly. If you have serious compute-bound sections, the model tends to break down, because you're effectively blocking a thread that you share with others.

Compute-bound multi-threaded does not work as well in Rust as it should. Problems include:

- Futex congestion collapse. This tends to be a problem with some storage allocators. Many threads are hitting the same locks. In particular, growing a buffer can get very expensive in allocators where the recopying takes place with the entire storage allocator locked. I've mentioned before that Wine's library allocator, in a .DLL that's emulating a Microsoft library, is badly prone to this problem. Performance drops by two orders of magnitude with all the CPU time going into spinlocks. Microsoft's own implementation does not have this problem.

- Starvation of unfair mutexes. Both the standard Mutex and crossbeam-channel channels are unfair. If you have multiple threads locking a resource, doing something, unlocking the resource, and repeating that cycle, one thread will win repeatedly and the others will get locked out.[1] If you need fair mutexes, there's "parking-lot". But you don't get the poisoning safety on thread panic that the standard mutexes give you.

If you're not I/O bound, this gets much more complicated.

[1] https://users.rust-lang.org/t/mutex-starvation/89080

jmspring

11 replies

10h19m

2024-03-25 08:07:19 UTC

You focus on rust rather than generalizing...

If you are IO bound, consider threads. This is almost the same as async / await.

What was missing above, and the problem with how most compute education is these days, if you are compute bound you need to think about processes.

If you were dealing with python concurrent.futures, you would need to consider processpooexecutor vs. threadpoolexecutor.

Threadpoolexecutor gives you the same as the above.

With multiprocessor executor, you will have multiple processes executing independently but you have to copy a memory space. Which people don't consider. In python DS work - multiprocessor workloads need to determine memory space considerations.

It's kinda f'd up how JS doesn't have engineers think about their workloads.

danbruc

5 replies

8h24m

2024-03-25 10:02:03 UTC

[...] if you are compute bound you need to think about processes.

How would that help? Running several processes instead of several threads will not speed anything up [1] and might actually slow you down because of additional inter-process communication overhead.

[1] Unless we are talking about running processes across multiple machines to make use of additional processors.

zelphirkalt

4 replies

8h4m

2024-03-25 10:21:44 UTC

I think you need to clarify what you mean by "thread". For example they are different things when we compare Python and Java Threads. Or OS threads and green threads. I think the GP was relating to OS threads.

danbruc

3 replies

7h33m

2024-03-25 10:52:39 UTC

I was also referring to kernel threads. If we are talking about non-kernel threads, then sure, a given implementation might have limitations and there might be something to be gained by running several processes, but that would be a workaround for those limitations. But for kernel threads there will generally be no gain by spreading them across several processes.

josephg

2 replies

6h58m

2024-03-25 11:27:36 UTC

Right; a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations. As I understand it, the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads.

But I suspect it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes. And of course, that way there’s no need for IPC.

All things being equal, you should get more performance out of a single process with a lot of threads than a lot of individual processes.

Galanwe

1 replies

5h22m

2024-03-25 13:04:10 UTC

a process is just a thread (or set of threads) and some associated resources - like file descriptors and virtual memory allocations

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

the scheduler doesn’t really care if you’re running 1000 processes with 1 thread each or 1 process with 1000 threads

Not just the scheduler, the whole kernel really. The concept of thread vs process is mainly a userspace detail for Linux. We arbitrarily decided that the set of clone() parameters from fork() create a process, while the set of clone() parameters through pthread_create() create a thread. If you start tweaking the clone() parameters yourself, then the two become indistinguishable.

it’s faster to swap threads within a process than swap processes, because it avoids expensive TLB flushes

Right, though this is more of a theorical concern than a practical one. If you are sensible to a marginal TLB flush, then you may as well "isolcpu" and set affinities to avoid any context switch at all.

that way there’s no need for IPC

If you have your processes mmap a shared memory, you effectively share address space between processes just like threads share their address space.

For most intent and purposes, really, I do find multiprocessing just better than multithreading. Both are pretty much indistinguishable, but separate processes give you the flexibility of being able to arbitrarily spawn new workers just like any other process, while with multithreading you need to bake in some form of pool manager and hope to get it right.

danbruc

0 replies

4h46m

2024-03-25 13:39:34 UTC

Or rather the reverse, in Linux terminology. Only processes exist, some just happen to share the same virtual address space.

Threads and processes are semantically quite different in standard computer science terminology. A thread has an execution state, i.e. its set of set of processor register values. A process on the other hand is a management and isolation unit for resources like memory and handles.

baq

1 replies

8h54m

2024-03-25 09:32:04 UTC

Backend JS just spins up another container and/or lambda and if it's too slow and requires multiple CPUs in a single deployment, oh well, too bad.

zelphirkalt

0 replies

8h3m

2024-03-25 10:22:40 UTC

That is of course a huge overhead, compared to how other languages solve the problem.

Hendrikto

1 replies

6h43m

2024-03-25 11:43:28 UTC

If you are IO bound, consider threads. This is almost the same as async / await.

Only in Python.

if you are compute bound you need to think about processes.

Also only in Python.

cryptonector

0 replies

3h6m

2024-03-25 15:20:24 UTC

If you're using threads then consider not using Python. Or, just consider not using Python.

initplus

0 replies

7h23m

2024-03-25 11:02:42 UTC

I think you are coming at this from a particular Python mindset, driven by the limitations imposed on Python threading by the GIL. This is a peculiarity specific to Python rather than a general purpose concept about threads vs processes.

temporarely

6 replies

6h13m

2024-03-25 12:12:49 UTC

Just morning bathroom musings based on your posts (yep /g) and this got me thinking maybe the robust solution (once and for all for all languages) may require a rethink at the hardware level. The CPU bound issue comes down to systemic interrupt/resume I think; if this can be done fairly for n wip thread-of-execution with efficient queued context swaps (say maybe a cpu with n wip contexts) then the problem becomes a resource allocation issue. Your thoughts?

binary132

3 replies

4h43m

2024-03-25 13:42:36 UTC

What you said sounded in my head more like you’re describing a cooperatively scheduled OS rather than a novel hardware architecture.

temporarely

1 replies

3h33m

2024-03-25 14:53:02 UTC

(This has been a very low priority background thread in my head this morning so cut me some slack on hand waving.)

Historically, the H/W folks addressed (pi) memory related architectural changes, such as when multicore came around and we got level caches. Imagine if we had to deal at software level with memory coherence in different cores [down to the fundamental level of invalidating Lx bytes]. There would be NUMA like libraries and various hacks to make it happen.

Arguably you could say "all that is in principle OS responsibility even memory coherence across cores" and we're done. Or you would agree that "thank God the H/W people took care of this" and ask can they do the same for processing?

The CPU model afaik hasn't changed that much in terms of granularity of execution steps whereas the H/W people could realize that d'oh an execution granularity in conjunction with hot context switching mechanism, could really help the poor unwashed coders in efficiently executing multiple competing sequences of code (which is all they know about at H/W level).

If your CPU's architecture specs n+/-e clock ticks per context iteration, then you compile for that and you design languages around that. CPU bound now becomes heavy CPU usage but is not a disaster for any other process sharing the machine with you. It becomes a matter of provisioning instead of programming ad-hoc provisioning ..

binary132

0 replies

2h12m

2024-03-25 16:14:22 UTC

If our implementations are bad because of preemption, then I’m not sure why the natural conclusion isn’t “maybe there should be less preemption” instead of “[even] more of the operating system should be moved into the hardware”.

marcosdumay

0 replies

2h47m

2024-03-25 15:38:52 UTC

If you have fewer threads ready to run than CPU cores, you never have any good reason to interrupt one of them.

Animats

1 replies

37m

2024-03-25 17:48:51 UTC

a cpu with N wip contexts

That's what "hyper-threading" is. There's enough duplicated hardware that beyond 2 hyper-threads, it seems to be more effective to add another CPU. If anybody ever built a 4-hyperthread CPU, it didn't become a major product.

It's been tried a few times in the past, back when CPUs were slow relative to memory. There was a National Semiconductor microprocessor where the state of the CPU was stored in main memory, and, by changing one register, control switched to another thread. Going way back, the CDC 6600, which was said to have 10 peripheral processors for I/O, really had only one, with ten copies of the state hardware.

Today, memory is more of a bottleneck that the CPU, so this is not a win.

temporarely

0 replies

29m

2024-03-25 17:56:53 UTC

Thanks, very informative.

_flux

4 replies

10h21m

2024-03-25 08:05:10 UTC

Jemalloc can use separate arenas for different threads which I imagine mostly solves the futex congestion issue. Perhaps it introduces new ones?

gpderetta

3 replies

6h4m

2024-03-25 12:22:18 UTC

IIRC glibc default malloc doesn't use per-thread arenas as they would waste too much memory on programs spawning tens of thousands of threads, and glibc can't really make too many workload assumptions. Instead I think it uses a fixed pool of arenas and tries to minimize contention.

These days on Linux, with restartable sequences, you can have true per-cpu arenas with zero contention. Not sure which allocator use them though.

JonChesterfield

1 replies

5h45m

2024-03-25 12:41:04 UTC

https://git.kernel.org/pub/scm/libs/librseq/librseq.git/tree...

Thank you, I didn't know about this one. An allocator that seems to use it at https://google.github.io/tcmalloc/rseq.html

jeffbee

0 replies

3h29m

2024-03-25 14:57:27 UTC

Not only does tcmalloc use rseq, the feature was contributed to Linux by the tcmalloc authors, for this purpose, among other purposes.

celrod

0 replies

3h30m

2024-03-25 14:56:02 UTC

As pressure from thread collisions increases, additional arenas are created via mmap to relieve the pressure. The number of arenas is capped at eight times the number of CPUs in the system (unless the user specifies otherwise, see mallopt), which means a heavily threaded application will still see some contention, but the trade-off is that there will be less fragmentation.

https://sourceware.org/glibc/wiki/MallocInternals

So glibc's malloc will use up to 8x #CPUs arenas. If you have 10_000 threads, there is likely to be contention.

neonsunset

3 replies

7h45m

2024-03-25 10:41:28 UTC

I keep having to repeat it on this unfortunate website: it's an implementation detail, a multi-threaded executor of async/await can cope with starvation perfectly well as demonstrated by .NET's implementation that can shrug off some really badly written code that interleaves blocking calls with asynchronous.

https://news.ycombinator.com/item?id=39530435

https://news.ycombinator.com/item?id=39786142

https://news.ycombinator.com/item?id=39721626

AstralStorm

2 replies

7h15m

2024-03-25 11:11:07 UTC

Said badly with code will still execute with poor performance and the mix can be actively hard to spot.

neonsunset

0 replies

6h56m

2024-03-25 11:29:52 UTC

You would be surprised. It ultimately regresses to "thread per request but with extra steps". I remember truly atrocious codebases that were spamming task.Result everywhere and yet there were performing tolerably even back on .NET Framework 4.6.1. The performance has (and often literally) improved ten-fold since then, with threadpool being rewritten, its hill-climbing algorithm receiving further tuning and it getting proactive blocked workers detection that can inject threads immediately without going through hill-climbing.

maccard

0 replies

5h24m

2024-03-25 13:02:24 UTC

Badly written threaded code will have the same problem, unfortunately

MuffinFlavored

3 replies

1h6m

2024-03-25 17:20:22 UTC

Async/await with one thread is simple and well-understood. That's the Javascript model.

Bit of nuance (I'm not an authority of this and I don't know the current up-to-date answer across 'all' of the Javascript runtimes these days):

Isn't it technically "the developer is exposed the concept of just one thread without having to worry about order of execution (other than callbacks/that sort of pre-emption)" but under the hood it can actually be using as many threads as it wants (and often does)? It's just "abstracted" away from the user.

recursive

1 replies

48m

2024-03-25 17:37:35 UTC

I don't think so. Two threads updating the same variable would be observable in code. There's no way around it.

MuffinFlavored

0 replies

42m

2024-03-25 17:43:55 UTC

I'm referring to something like this: https://stackoverflow.com/questions/7018093/is-nodejs-really...

It's like a pedantic technical behind the scenes point I think, just trying to learn "what's true"

kaba0

0 replies

30m

2024-03-25 17:56:12 UTC

I mean, yeah, if you go deep enough your OS may decide to schedule your browser thread to a different core as well. I don’t think it has any relevance here - semantically, it is executed on a single thread, which is very different from multi-threading.

HarHarVeryFunny

2 replies

3h6m

2024-03-25 15:19:45 UTC

I'm not sure what you mean by "multi-threaded async/wait"... Isn't the article considering async/await as an alternative to threads (i.e. coroutines vs threads)?

I'm a C++ programmer, and still using C++17 at work, so no coroutines, but don't futures provide a similar API? Useful for writing async code in serialized fashion that may be easier (vs threads) to think about and debug.

Of course there are still all the potential pitfalls that you enumerate, so it's no magic bullet for sure, but still a useful style of programming on occasion.

zarzavat

0 replies

2h16m

2024-03-25 16:10:17 UTC

They mean async/await running over multiple OS threads compared to over one OS thread.

You can also have threads running on one OS thread (Python) or running on multiple OS threads (everything else).

Every language’s concurrency model is determined by both a concurrency interface (callbacks, promises, async await, threads, etc), and an implementation (single-threaded, multiple OS threads, multiple OS processes).

lights0123

0 replies

2h44m

2024-03-25 15:42:03 UTC

async/await tasks can be run in parallel on multiple threads, usually no more threads than there are hardware threads. This allows using the full capabilities of the machine, not just one core's worth. In a server environment with languages that support async/await but don't have the ability to execute on multiple cores like Node.js and Python, this is usually done by spawning many duplicate processes and distributing incoming connections round-robin between them.

exfalso

1 replies

10h41m

2024-03-25 07:45:20 UTC

Yes, 100%.

I've mostly only dealt with IO-bound computations, but the contention issues arise there as well. What's the point of having a million coroutines when the IO throughput is bounded again? How will coroutines save me when I immediately exhaust my size 10 DB connection pool? It won't, it just makes debugging and working around the issues harder and difficult to reason about.

AstralStorm

0 replies

7h19m

2024-03-25 11:06:38 UTC

The debugging issue is bigger than what it seems.

Use of async/await model in particular ends up with random hanged micro tasks in some random place in code that are very hard to trace back to the cause because they're dispersed potentially anywhere.

Concurrency is also rather undefined, as are priorities most of the time.

This can be partly fixed by labelling, which adds more complexity, but at least is explicit. Then the programmer needs to know what to label... Which they won't do, and Rust has no training wheels to help with concurrency.

Threads, well you have well defined ingress and egress. Priorities are handled by the OS and to some degree fairness is usually ensured.

miohtama

0 replies

3h5m

2024-03-25 15:20:35 UTC

Also there is the extra software development and maintenance cost due to coloured functions that async/await causes

https://journal.stuffwithstuff.com/2015/02/01/what-color-is-...

Unless you are doing high scalability software, async might not be worth of the trade offs.

klysm

0 replies

4h6m

2024-03-25 14:20:22 UTC

I find this has worked well for me when I can easily state what thread pool work gets executed on.

exfalso

23 replies

10h56m

2024-03-25 07:30:11 UTC

It's interesting to see an almost marketing-like campaign to save face for async/await. It is very clear from my experience that it was not only a technical mistake, it also cost the community dearly. Instead of focusing on language features that are actually useful, the Rust effort has been sidetracked by this mess. I'm still very hopeful for the language though, and it is the best thing we've got at the moment. I'm just worried that this whole fight will drag on forever. P.S the AsyncWrite/AsyncRead example looks reasonable, but in fact you can do the same thing with threads/fds as long as you restrict yourself to *nix.

junon

9 replies

9h28m

2024-03-25 08:58:08 UTC

I've used async in firmware before. It was a lifesaver. The generalizations you make are unfounded and are clearly biased toward a certain workload.

AstralStorm

5 replies

7h7m

2024-03-25 11:19:19 UTC

I'd love the detail on this, what did it save you from and how did you ensure your firmware does not, say, hang?

junon

4 replies

7h0m

2024-03-25 11:26:12 UTC

Having to implement my own scheduler for otherwise synchronous network, OLED, serial and USB drivers on the same device, as well as getting automatic power state management when the executor ran out of arrived promises.

And a watchdog timer, like always. There's no amount of careful code that absolves you from using a watchdog timer.

For anyone curious, Embassy is the runtime/framework I used. Really well built.

thadt

2 replies

5h45m

2024-03-25 12:40:41 UTC

That sounds kind of amazing. Working low level without an OS sounds like exactly the kind of place that Rust's concurrency primitives and tight checking would really be handy. Doing it in straight up C is complicated, and becomes increasingly so with every asynchronous device you have to deal with. Add another developer or two into the mix, and it can turn into a buggy mess rather quickly.

Unless you pull in an embedded OS, one usually ends up with a poor man's scheduler being run out of the main loop. Being able to do that with the Rust compiler looking over your shoulder sounds like it could be a rather massive benefit.

cozzyd

1 replies

3h32m

2024-03-25 14:54:19 UTC

The way to do it in C isn't all that different, is it? You just have explicit state machines for each thing. Yes you have to call thing_process() in the main loop at regular intervals (and probably have each return an am_busy state to determine if you should sleep or not). It's more code but it's easy enough to reason about and probably easier to inspect in a debugger.

thadt

0 replies

2h38m

2024-03-25 15:48:11 UTC

Yep, the underlying mechanics have to do the same thing - just swept under another a different rug. I imagine the (potential) advantage as being similar to when we had to do the same thing with JavaScript before promises came along. You would make async calls that would use callbacks for re-entry, and then you would need to pull context out from someplace and run your state machine.

Being able to write chains of asynchronous logic linearly is rather nice, especially if it's complicated. The tradeoff is that your main loop and re-entry code is now sitting behind some async scheduler, and - as you mention - will be more opaque and potentially harder to debug.

temporarely

0 replies

6h27m

2024-03-25 11:58:46 UTC

thanks. looked that up. for the curious: https://embassy.dev/

jerf

2 replies

4h51m

2024-03-25 13:35:10 UTC

I personally agree that it is great that Rust as a language is able to function in an embedded environment. Someone needs to grasp that nettle. I started by writing "concede" in my first sentence there but it's not a concession. It's great, I celebrate it, and Rust is a fairly good choice for at least programmers working in that space. (Whether it's good for electrical engineers is another question, but that's a debate for another day.)

However, the entire Rust library ecosystem shouldn't bend itself around what is ultimately still a niche use case. Embedded uses are still a small fraction of Rust programs and that is unlikely to change anytime soon. I am confident the vast bulk of Rust programs and programmers, even including some people vigorously defending async/await, would find they are actually happier and more productive with threads, and that they would be completely incapable of finding any real, perceptible performance difference. Such exceptions as there may be, which I not only "don't deny" exist but insist exist, are welcome to pay the price of async/await and I celebrate that they have that choice.

But as it stands now, async/await may be the biggest current premature optimization in common use. Though "I am writing a web site that will experience upwards of several dozen hits per minute, should I use the web framework that benches at 1,000,000 reqs/sec or 2,000,000 reqs/sec?" is stiff competition.

asa400

1 replies

1h14m

2024-03-25 17:11:33 UTC

But as it stands now, async/await may be the biggest current premature optimization in common use.

To be fair, isn't the entire point of OP's essay that async/await is useful specifically for reasons that aren't performance? Rather, it is that async/await is arguably more expressive, composable, and correct than threads for certain workloads.

And I have to say I agree with OP here, given what I've experienced in codebases at work: doing what we currently do with async instead with threads would result in not-insubstantial pain.

jerf

0 replies

1h0m

2024-03-25 17:25:33 UTC

I disagree with the original essay comprehensively. async/await is less composable (threads automatically compose by their nature; it is so easy it is almost invisible), a tie on expressive (both thread advocates and async advocates play the same game here where they'll use a nice high-level abstraction on "their" side and compare it to raw use of the primitives on the other, both are perfectly capable of high-level libraries making the various use cases easy), and I would say async being more "correct" is generally not a claim that makes much sense to me either way. The correctness/incorrectness comes from things other than the runtime model.

Basically, async/await for historical reasons grew a lot of propaganda around how they "solved" problems with threads. But that is an accidental later post-hoc rationalization of their utility, which I consider for the most part just wrong. The real reason async-await took off is that it was the only solution for certain runtimes that couldn't handle threading. That's fine on its own terms, but is probably the definitive example of the dangers of taking a cost/benefit balance from one language and applying it to other languages without updating the costs and the benefits (gotta do both!). If I already have threads, the solution to their problems is to use the actor model, never take more than one mutex at a time, share memory by communicating instead of communicating by sharing memory, and on the higher effort end, Rust's lifetime annotations or immutable data. I would never have dreamed of trying to solve the problems with async/await.

ykonstant

4 replies

10h13m

2024-03-25 08:13:03 UTC

Instead of focusing on language features that are actually useful, the Rust effort has been sidetracked by this mess.

I don't know if you are correct or not (I am not very familiar with Rust) but empirically 9/10 Rust discussions I see nowadays on HN/reddit do revolve around async. It kinda sucks for me because I don't care about async at all, and I am interested in reading stuff about Rust.

klabb3

1 replies

8h1m

2024-03-25 10:25:27 UTC

empirically 9/10 Rust discussions I see nowadays on HN/reddit do revolve around async. It kinda sucks for me because I don't care about async at all, and I am interested in reading stuff about Rust.

100% yes. I really feel bad precisely for people in your situation. But, there’s a good reason why you see this async spam on HN:

(I am not very familiar with Rust)

Once you get past the initial basics (which are amazing, with pattern matching, sum types, traits, RAII, even borrowing can be quite fun), you’ll want to do something “real world”, which typically involves some form of networked IO. That’s when the nightmare begins.

So there’s traditional threaded IO (mentioned in the post), which lets you defer the problem a bit (maybe you can stay on the main thread if eg you’re building a CLI). But every crate that does IO needs to pick a side (async or sync). And so the lib you want to use may require it, which means you need to use async too. There are two ways of doing the same thing - which are API incompatible - meaning if you start with sync and need async later - get ready for a refactor.

Now, you (and the crates you’re using) also have to pick a faction within async, ie which executor to use. I’ve been OOTL a while but I think it’s mostly settled on Tokio these days(?), which probably is for the best. There are even sub-choices about Send-ness (for single-vs-multithreaded executors) and such that also impact API-compatibility.

In either case, a helluva lot of people with simple use-cases absolutely need to worry about async. This is problematic for three main reasons: (1) these choices are front-loaded and hard to reverse and (2) async is more complex to use, debug and understand, and (3) in practice constrains the use of Rust best-practice features like static borrowing.

I don’t think anyone questions the strive for insane performance that asynchronous IO (io_uring, epoll etc) can unlock together with low-allocation runtimes. However, async was supposed to be an ergonomic way to deliver that, ideally without splitting the ecosystem into camps. Otherwise, perhaps just do manual event looping with custom state machines, which doesn’t need any special language features.

iroddis

0 replies

6h21m

2024-03-25 12:04:39 UTC

Thank you for posting this. Async vs sync is a tough design decision, and the ecosystem of crates in async gets further partitioned by the runtime. Tokio seems to be the leader, but making the async runtime a second-class consideration has just made the refactor risk much higher.

I like async, and think it makes a lot of sense in many use cases, but it also feels like a poisoned pill, where it’s all or nothing.

asa400

0 replies

1h10m

2024-03-25 17:16:06 UTC

Honestly: just ignore them. Just start using Rust! It's a lovely, useful language. HN/reddit are not representative most of the people out there in the real world writing Rust code that solves problems. I am not saying their concerns are invalid, but there is a tendency on these forums to form a self-reinforcing collective opinion that is overfit to the type of people who like to spend time on these forums. Reality is almost always richer and more complicated.

Ygg2

0 replies

5h46m

2024-03-25 12:40:17 UTC

It's true for general programming focused Reddits, but eh, those will always try to poke fun at languages from the extreme perspective. E.g. they are a problem for people new to a language but not for others that have gone through learning the language extensively.

Remember, there are only two types of languages - languages no one uses and languages people complain about. When was the last time you heard Brainfuck doesn't give you good tools to access files?

On /r/rust Reddit, most talk is about slow compile/build times.

logicchains

3 replies

10h34m

2024-03-25 07:52:00 UTC

It's not a technical mistake, it's a brilliant solution for when you need ultra-low-latency async code. The mistake is pushing it for the vast majority of use-cases where this isn't needed.

surajrmal

1 replies

6h29m

2024-03-25 11:56:47 UTC

The reason it's pushed everywhere is because it only works well if the ecosystem uses it. If the larger rust ecosystem used sync code in most places, async await in rust would be unusable by large swaths of the community.

vacuity

0 replies

6h0m

2024-03-25 12:26:27 UTC

I think it wouldn't be so painful even as it's pervasive if the ergonomics were far better. Unfortunately, things are still far off on that front. Static dispatch for async functions recently landed in stable, though without a good way to bound the return type. Things like a better pinning API, async drop, interoperability and standardization, dynamic dispatch for async functions, and structured concurrency are open problems, some moving along right now. It'll be a process spanning years.

speed_spread

0 replies

4h2m

2024-03-25 14:23:33 UTC

It's not pushed, it's pulled by the hype. The Rust community got onboard that train and started to async all the things without regards to actual need or consequences.

uobytx2

0 replies

3h28m

2024-03-25 14:58:20 UTC

Do you have any evidence to back up the claim that the async efforts have taken away from other useful async features?

Also, lots of major rust projects depend on async for their design characteristics, not just for the significant performance improvements over the thread-based alternatives. These benefits are easy to see in just about any major IO-bound workload. I think the widespread adoption of async in major crates (by smart people solving real world problems) is a strong indicator that async is a language feature that is "actually useful".

The fight is mostly on hackernews and reddit, and mostly in the form of people who don't need async being upset it exists, because all the crates they use for IO want async now. I understand that it isn't fun when that happens, and there are clearly some real problems with async that they are still solving. It isn't perfect. But it feels like the split over async that is apparent in forum discussions just isn't nearly as wide or dramatic in actual projects.

guappa

0 replies

8h29m

2024-03-25 09:57:08 UTC

If you think that threads are faster than poll(), I would like to know in what use case that happens, because I have never once encountered this in my life.

asa400

0 replies

1h22m

2024-03-25 17:03:55 UTC

I have to respectfully disagree. People are allowed to like stuff, it doesn't make it a marketing campaign or a conspiracy. This negativity on async, as if it's just a settled debate that async is a failure and therefor Rust is a failure, feels self-reinforcing.

I've written a fair amount of Rust both with threads and with async and you know what? Async is useful, very often for exactly the reasons the OP mentions, not necessarily for performance. I don't like async for theoretical reasons. I like it because it works. We use async extensively at my job for a lot of lower-level messaging and machine-control code and it works really well. Having well-defined interfaces for Future and Stream that you can pass around is nice. Having a robust mechanism for timers is nice. Cancellation is nice. Tokio is actually really solid (I can't speak to the other executors). I see this "async sucks" meme all over the various programming forums and it feels like a bunch of people going "this thing, that isn't perfect, is therefor trash". Are we not better than this?

This is not to say that async doesn't have issues. Of course it does. I'm not going to enumerate them here, but they exist and we should work to fix them. There are plenty of legitimate criticisms that have been made and can continue to be made about Rust's concurrency story, but async hasn't "cost the community dearly". People who wouldn't use Rust otherwise are using async every single day to solve real problems for real users with concurrent code, which is quite a bit more than can be said for all kinds of other theoretical different implementations Rust could have gone with, but didn't.

Ygg2

0 replies

5h50m

2024-03-25 12:36:11 UTC

It's interesting to see an almost marketing-like campaign to save face for async/await.

It's not, it's just Rust people want to explain choices that led them here, and HN/Reddit crowd is all about async controversy, so more Rust people make blog about it, so more HN/Reddit focus on it. Like any good controversy, it's a self-reinforcing cycle.

it also cost the community dearly

Citation needed. Having async/await opened a door to new contributors as well, it was a REALLY requested feature, and it made stuff like Embassy possible.

And it made stuff like effects and/or keyword generics a more requested feature.

zaphar

12 replies

5h31m

2024-03-25 12:55:14 UTC

The complaints around async/await vs threads to my mind have not been that one is more or less complex than the other. It is that it bifurcates the ecosystem and one of them ends up being a second class citizen causing friction when you choose the wrong one for your project.

While you can mix and match them it's hacky and inefficient when you need to. As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem. Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

There is a hypothetical world where Rust used abstractions that are even more composable than async/await, whose composability really wants everything else to be async/await too. If that had happened then I think most of the complaints would have disappeared.

Kinrany

4 replies

4h18m

2024-03-25 14:08:16 UTC

Since nearly everything you might want to do in Rust probably involves IO with very few exceptions that means for the most part you should probably ignore non-async libraries regardless of whether the rest of your application wants it to be async or not.

Only if you have two libraries to choose from and they are otherwise identical, which is rare. Using blocking code in async applications is not as seamless as it should be but not hard. Instead of writing `foo()` you write `tokio::spawn_blocking(foo).await`. It will run the new code in a separate thread and return a future that will resolve once that thread is done.

zaphar

2 replies

4h3m

2024-03-25 14:22:42 UTC

That assumes you are using Tokio. As another poster said not only does the ecosystem fragment along the async/non async lines but along the runtime lines. Async is an extremely leaky abstraction. You are in a way making my point for me. If you want to avoid painful refactoring you should basically always start out tokio-async and shim in non async code as needed because going the other way is going to hurt.

Kinrany

1 replies

3h48m

2024-03-25 14:37:43 UTC

`spawn_blocking` is a single function and it is not complicated, it shouldn't be too hard for other runtimes to do the same.

The end application does have to choose a runtime anyway and will have to stick with it because this area isn't standardized yet. This problem mostly affects the part of the ecosystem that wants to put complicated concurrency logic into libraries.

cryptonector

0 replies

3h4m

2024-03-25 15:22:21 UTC

`spawn_blocking` should be part of a core executor interface that all executors must provide.

Kinrany

0 replies

4h15m

2024-03-25 14:10:52 UTC

And of course most libraries don't even need IO because the application can do it for them, so it only makes sense for them to be async if they're computationally heavy enough to cause problems for the runtime.

K0nserv

4 replies

5h5m

2024-03-25 13:20:51 UTC

I agree with your diagnosis. It's what I concluded in my own Rust async blog post[0](which surely are mandatory now). It's even worse than bifurcating the ecosystem because even within async code it's almost always closely tied to the executor, usually Tokio. I talk about this as an extension to function colouring, adopting without.boats's three colour proposition with blue(non-IO), green(blocking-IO), and red(async-IO). In the extended model it's really blue, green, red(Tokio), purple(async-std), and orange(smol) etc.

I find that the sans-IO pattern is the best solution to this problem. Under this pattern you isolate all blue code and use inversion of control for I/O and time. This way you end up with the core protocol logic being unaware of IO and it becomes simple to wrap it in various forms of IO.

0: https://hugotunius.se/2024/03/08/on-async-rust.html

fsociety

1 replies

4h17m

2024-03-25 14:09:08 UTC

This is a cool pattern, thanks for the share.

K0nserv

0 replies

4h12m

2024-03-25 14:13:33 UTC

No problem, here are some examples of it:

* Quinn(A QUIC implementation, in particular `quinn-proto` is sans-IO, whereas the outer crate, `quinn`, is Tokio-based)[0)

* str0m(A WebRTC implementation that I work on, it's an alternative to `webrtc-rs`. We don't have any IO-aware wrappers, the user is expected to provide that themselves atm)[1]

0: https://github.com/quinn-rs/quinn

1: https://github.com/algesten/str0m/

KiloCNC

1 replies

2h12m

2024-03-25 16:14:12 UTC

I love the fact that people outside the Python ecosystem are spreading the word about sans-IO. I think it should be the next iteration in coding using futures-based concurrency. I only wish it were more popular in the Python land as well.

zozbot234

0 replies

1h32m

2024-03-25 16:53:44 UTC

The Haskell folks got there first.

HideousKojima

1 replies

5h22m

2024-03-25 13:04:24 UTC

As it stands now the Rust ecosystem has decided that if you want to do anything involving IO you are stuck with an all async/await ecosystem.

I mean C# works basically the same way, even through there are non-async options for IO using the async options basically forces you to be async all the way back to Main(). There are ways to safely call async methods from sync methods but they make debugging infinitely harder.

zaphar

0 replies

4h18m

2024-03-25 14:08:19 UTC

Well, yes. That doesn't mean it's not annoying though. It happens in every language that provides syntactic support for the distinction between async/await and non async. It's, I think, core to the syntactic and semantic abstractions that were popularized by Javascript.

darthrupert

12 replies

11h0m

2024-03-25 07:25:36 UTC

My possibly unpopular opinion is that async/await is a mistake as a first class programming construct. That functionality should 100% be in libraries for two reasons: 1) that way it won't infect the actual language in any way and 2) it will be more difficult to use so people will only reach for it if they really need it.

Sync code and threads is the way to go for 99% of the cases where concurrency is needed. Rust handles most of the footguns of that combination anyway.

nurple

11 replies

10h37m

2024-03-25 07:49:27 UTC

Couldn't disagree more. In my experience, single-threaded event-loop driven async/await should be used for every possible concurrency need, with the complexity of multi-threaded concurrency being reserved for the rare cases it's needed. As auto-scaled services and FaaS began to become popular, I've found most any need for multithreaded programming almost wholly unnecessary.

spacechild1

6 replies

8h49m

2024-03-25 09:36:37 UTC

Not everything is a web backend... CPU bound workloads may be rare in your domain, but I would be careful with generalizations.

nurple

3 replies

2h50m

2024-03-25 15:35:38 UTC

It's true though that in most cases single-threaded event-loop concurrency is sufficient. I wasn't attempting to make a generalization, I was saying that in cases where the CPU-bound work _is_ more voluminous, I prefer process-parallelism over threaded parallelism for such workloads (of which FaaS is just one possible impl).

spacechild1

2 replies

1h50m

2024-03-25 16:36:11 UTC

It's true though that in most cases single-threaded event-loop concurrency is sufficient.

This still sounds very web-centric.

In general, concurrency and parallelism are really orthogonal concepts. When threads are used for concurrency, e.g. to handle multiple I/O bound tasks, they do indeed compete with coroutines (async). But when we want parallelism, coroutines are not even an option. That's why it irks me when people compare threads with coroutines without specifying the exact use case.

I prefer process-parallelism over threaded parallelism for such workloads

Process-parallelism is just a special case of multithreading and in many domains its not even a realistic option.

nurple

1 replies

1h30m

2024-03-25 16:56:09 UTC

But coroutines _are_ an option for parallelism, and an especially effective one in the I/O-bound world. The parallelism comes from the OS handling I/O scheduling for the coroutines instead of the application code.

The important difference between multithreading and multiprocess is that I can ignore synchronization that's not IPC in multiprocess models which makes the code much much simpler to implement and reason about.

I wouldn't even say this is web-centric, process-parallelism is a pretty common method of task dispatch in HPC compute topologies that has filtered down to smaller scales of multi-server compute clusters a la kubernetes. In these cases, taking a process-centric message-passing approach can greatly simplify not only the code but the architectural aspects of scheduling and scaling that are quite a bit more difficult with multi-threaded processes or even those that mix I/O and CPU-bound work in the same process (which is often a cause of thread starvation issue in node apps).

spacechild1

0 replies

48m

2024-03-25 17:37:48 UTC

But coroutines _are_ an option for parallelism, and an especially effective one in the I/O-bound world.

Parallelism means that code executes simultaneously (on different CPUs). Coroutines (on a single threaded executor) don't do that.

The parallelism comes from the OS handling I/O scheduling for the coroutines instead of the application code.

Where exactly is the parallelism that is enabled by coroutines? Coroutines are nothing more than resumable functions.

When it comes to I/O performance, threads vs. async is really a false dichotomy. What we should be comparing is

1. threads + blocking I/O

2. non-blocking I/O (select, poll, epoll)

3. asynchronous I/O (io_ring on Linux, IOCP on Windows)

Coroutines may only provide syntactic sugar for 2. and 3.

gpderetta

1 replies

5h35m

2024-03-25 12:50:50 UTC

When every problem looks like a nail, the only thing you need is an hammer!

discreteevent

0 replies

3h49m

2024-03-25 14:36:34 UTC

Not only that but autoscaling with FAAS is an expensive, locked in and complex to administer way to provide functionality.

zokier

2 replies

9h16m

2024-03-25 09:09:52 UTC

The problem is that in Rust the main way of doing async/await (Tokio) is multithreaded by default. So you get all the problems of multithreading, and additionally all the problems with async/await. The discourse would be very different if the default choice of executor would have been thread-local.

struant

0 replies

2h38m

2024-03-25 15:48:15 UTC

Sounds like a problem with Rust. Not a problem with async/await.

nurple

0 replies

2h18m

2024-03-25 16:08:14 UTC

Yes, this is a big problem in my mind and something I mentioned in another comment. Not that multi-threaded async/await is wrong, but the conflation of two methods of concurrency causes a lot of problems you don't see in single-threaded event-loops archs.

In fact it makes me feel like the other thread talking about how async/await in rust was mainly forced on the community to capture js devs seems a bit unlikely as the only thing similar between the two are the spelling of the keywords.

t-writescode

0 replies

9h9m

2024-03-25 09:17:07 UTC

I'm actually more of a fan of the actor model than the single-threaded event-loop. Individual blobs of guaranteed synchronicity with message passing.

hsjsbeebue

10 replies

12h36m

2024-03-25 05:49:57 UTC

What is the language agnostic answer to the same question?

I imagine something to do with memory usage or avoiding thread or thread pool starvation issues. Maybe performance too?

lmm

8 replies

12h0m

2024-03-25 06:26:19 UTC

IMO the biggest reason to avoid threads is simply that it's ~impossible to write safe code using threads (e.g. without race conditions). Arguably with Rust's ownership system that's less true there than in other languages.

logicchains

4 replies

10h30m

2024-03-25 07:55:49 UTC

You can write safe code using threads if you enforce that the only way threads can communicate is by sending messages to each other (via copying, not pointers). This is what Erlang does.

mike_hearn

2 replies

7h57m

2024-03-25 10:29:29 UTC

No, you can get races and deadlocks in a pure actor system as well. It's actually easier in my experience to end up with problems in actors. I tried writing an app in that style once and had to back off that design and introduce some traditional shared memory multi-threading on some codepaths.

There are no shortcuts, no silver bullets when it comes to concurrency. Programmers have to learn about multi-threading and the ways it can go wrong because it's a fundamental thing, not a mere feature you can design your way out of.

mrkeen

0 replies

6h25m

2024-03-25 12:01:26 UTC

There are no shortcuts, no silver bullets when it comes to concurrency.

But you can identify the opposite of the silver bullet, which is shared mutable state, and then hurry away from it as quickly as possibly.

Rust defaulted to avoiding sharing and Haskell defaulted to avoiding mutability.

For application code, I've yet to see a better concurrency story than 'use pure functions where possible, or use STM when you absolutely need shared mutation'.

logicchains

0 replies

7h42m

2024-03-25 10:44:22 UTC

No, you can get races and deadlocks in a pure actor system as well.

You can't get data races, which is what Rust prevents. Rust's async doesn't prevent deadlocks or other kinds of races.

lmm

0 replies

7h13m

2024-03-25 11:13:27 UTC

You can write safe code using threads if you enforce that the only way threads can communicate is by sending messages to each other (via copying, not pointers).

At which point they're not really threads, they're more, well, processes.

This is what Erlang does.

Exactly.

treflop

0 replies

11h21m

2024-03-25 07:05:31 UTC

Asynchronous code has race conditions and synchronization issues too.

I pray for all the code written by people who think they didn’t need to learn about synchronization because they wrote asynchronous code.

And unfortunately I’ve come across and had to fix asynchronous code with race conditions.

You cannot escape learning about synchronization. Writing race-condition-free code is not hard.

What is actually hard is writing fast lock-free routines, but that’s more a parallelism problem that affects both threaded and asynchronous code. And most people will never need to reach that level of code optimization for their work.

lelanthran

0 replies

10h35m

2024-03-25 07:50:53 UTC

IMO the biggest reason to avoid threads is simply that it's ~impossible to write safe code using threads (e.g. without race conditions).

Javascript has race conditions too, even with no threads involved.

kaba0

0 replies

11h45m

2024-03-25 06:40:37 UTC

Async-await is about concurrency, not parallelism. It can work in both a single-threaded and multi-threaded context, the latter exposing all the typical failure modes of multi-threaded code.

Also, Rust’s ownership model only prevents data races, that’s only the tip of the iceberg of race conditions, and I don’t think that any general model makes it possible to statically determine that any given multithreaded code is safe. Nonetheless, that’s the only way to speed up most kind of code, so possibly the benefits outweigh the cost in many cases.

toast0

0 replies

12h11m

2024-03-25 06:14:51 UTC

I don't have a lot of experience with async/await at high numbers of tasks, but I've run Erlang with millions of processes. It's a lot easier to run millions of Erlang processes on one machine than to run a million OS threads. I suspect async tasks would be similar; an OS thread needs its own stack, and that's going to use at least a page of memory, but often much more. Otoh, an async task or green thread might be able to use less.

If you're running real OS threads, I think task switching is going to be real context switches, which might mean spectre mitigations clear your cpu caches, but task switching can avoid that.

You may end up with more system calls with OS threads, because your runtime might be able to aggregate things a bit (blocking reads become kqueue/epoll/select, but maybe that's actually a wash, because you do still need a read call when the FD is ready, and real blocking only makes a single call)

anacrolix

10 replies

10h29m

2024-03-25 07:57:04 UTC

async/await is syntax. Most of what people associate with it is actually the benefit of virtual threading. Asynchrony is how virtual threading is achieved in user space. Rust has direct native threading by default and can't do virtual threading without a runtime. Async/await makes the use of a runtime explicit. It could be possible to have both virtual and OS threading without special syntax but it would require a marker trait, algebraic effects or monads. Without those things you either need to choose user or OS level threads by default. If user level threads are the default, you need a runtime by default. (Even if that runtime is only for concurrency.). Go made that the default. Again, you don't need async/await to have virtual threading.

tsimionescu

8 replies

10h2m

2024-03-25 08:24:08 UTC

Async/await is syntax, but it's not equivalent to virtual threads. It is special syntax for a certain pattern of writing concurrent (and possibly parallel) code: one in which you launch a concurrent operation specifically to get back one result. When you start an OS thread or a virtual thread, that thread can do anything. When you launch a task, it can only do one thing: return one result of the type you asked for.

Async/await is perfect for operations that map well onto this structure. For example, most IO reads and writes fit well into this model - you trigger the IO operation to get back a specific result, then do some other work while it's being prepared, and when you need the result, you block until it's available. Other common concurrent operations don't map well on this at all - for example, if you want to monitor a resource to do something every time its state changes, then this is not well modeled at all as a single operation with a unique result, and a virtual or real thread would be a much better option.

Also, having both virtual and OS threads doesn't need any special syntax. You just need two different functions for creating a thread - StartOSThread(func) and StartVirtualThread(func), and a similar split for other functions that directly interact with threads (join, cancel, etc), and a function for telling whether you are currently running in a virtual thread. Everything else stays the same. This is what Java is doing with Project Loom, I'm not speaking just in principle.

The huge difficulty with virtual threads is implementing all blocking operations (IO, synchronization primitives, waits, etc) such that they use async OS primitives and yield to the virtual thread scheduler, instead of actually blocking the OS thread running the virtual thread.

gpderetta

4 replies

7h7m

2024-03-25 11:19:15 UTC

Aside from the fact that an async/await operation can have arbitrary side effects , returning values is not a prerogative of async/await. For example in c++:

  void foo() {
     auto future = std::async([]{ /* do some stuff */; return some_value; });
     ... // do some work concurrently

     // join the async operation
     auto some_value = future.get();
  }

Here std::async starts a thread (or potentially queues a task in a thread pool); please don't confuse async/await (i.e. stackless coroutines), with more general future/promise and structured concurrency patterns.

edit: note I'm not making any claims that std::async, std::future, std::promise are well designed.

tsimionescu

3 replies

4h4m

2024-03-25 14:21:57 UTC

Sure, async/await can have arbitrary side effects, just like functions, and threads can be used to return values - but the design pattern nudges you to a particular end. Basically the async/await version of some code looks like this:

  f = await getAsync();

While the thread-based version looks like this:

  var f;
  t = startThread(getAsync(&f));
  t.join();

This is a fundamentally different API. Of course you can implement one with the other, but that doesn't make them the same thing (just like objects and closures can be used to implement each other, but they are different nonetheless).

Of course, there are other concurrency APIs as well, such as the C++ futures you show. Those have other advantages, disadvantages, and workflows for which they fit best. The main difference is that get() on the future blocks the current OS thread if the value isn't available yet, while await doesn't. Thus, futures aren't very well suited to running many concurrent IO operations on the same thread.

WorldMaker

1 replies

3h29m

2024-03-25 14:56:40 UTC

You never "have" to await at the call site. There's very little stopping you from:

  fFuture = getAsync();
  // do other things
  f = await fFuture;

Most implementations of async/await Future/Task/Promise often have some sort of `all` combinator to multi-join as soon as all results arrive, even:

  fFuture = getFAsync();
  gFuture = getGAsync();
  hFuture = getHAsync();
  // other stuff
  (f, g, h) = await all(fFuture, gFuture, hFuture);

The syntax makes it really easy to do the simple step-by-step await at every call site, but it also doesn't stop you from writing more complex things. (Sometimes very much more complex things when you get into combinators like `all` and `race`.)

tsimionescu

0 replies

3h11m

2024-03-25 15:14:51 UTC

Oh, yes, I didn't think to mention that since I was focused on showing what pattern the syntax makes easiest.

gpderetta

0 replies

3h6m

2024-03-25 15:20:30 UTC

To be 100% clear: the std::async example I gave uses threads. Don't confuse a specific OO API with the general concept. Even vintage pthreads allow returning values on thread join.

And of course you would run multiple concurrent operations on separate threads.

edit: and of course futures are completely orthogonal to threads vs async/await. You can use them with either.

mike_hearn

1 replies

8h0m

2024-03-25 10:26:16 UTC

You don't technically need two ways to start threads. That's how Java does it, and there's some technical reason for it that I always forget. There are edge cases where virtual and physical threads aren't completely interchangeable.

tsimionescu

0 replies

4h2m

2024-03-25 14:24:05 UTC

You need to know if you're opening an OS thread or a virtual thread if you intend to interact with the OS natively. For example, if you want to call a C library that expects to control and block its thread, you need to ensure you are running on a dedicated OS thread, otherwise you might block all available threads.

anonymousDan

0 replies

23m

2024-03-25 18:02:34 UTC

I feel like the last difficulty should potentially be easier with the emergence of IO uring support for a wider variety of system calls. Would you agree?

littlestymaar

0 replies

1h27m

2024-03-25 16:58:43 UTC

But at the same time, the syntax itself is what matters to most of its users. OS thread can be fast enough for most workloads, but having the syntax showing up where the IO is happening, easily run concurrently what can be, and cancel stuff after a timeout, is the superpower of async/await.

Discussing about implantation performance is missing the forest for the tree. It's like comparing exceptions to error enums on that prism alone: sure there are some implementation where this or that is more efficient than the other, but the ability to express intent in the code is what make those killer features.

ikekkdcjkfke

7 replies

12h16m

2024-03-25 06:10:29 UTC

What i would hope for is implicit async/await so i don't have to write it out / wrap all the time

yodon

2 replies

11h49m

2024-03-25 06:37:11 UTC

implicit async/await

I keep suspecting C# will be the place where we see this, but probably not for another couple years.

asabla

0 replies

9h39m

2024-03-25 08:46:44 UTC

We might not be that far away already. There is this issue[1] on Github, where Microsoft and the community discuss some significant changes.

There is still a lot of questions unanswered, but initial tests look promising.

Ref: https://github.com/dotnet/runtime/issues/94620

anonymoushn

0 replies

7h51m

2024-03-25 10:34:39 UTC

We have it in Zig, Lua, Python, C, etc.

groestl

1 replies

11h53m

2024-03-25 06:32:48 UTC

That's eerily similar to what (green) threads are.

hinkley

0 replies

10h56m

2024-03-25 07:29:46 UTC

I’ve always thought async await is just a less clunky implementation of green threads. Not as a feature parity replacement, but trying to fill the same niche.

t-writescode

0 replies

9h11m

2024-03-25 09:15:06 UTC

In Kotlin, only the outer-most wrapping 'suspend' function swap needs to be wrapped; and when you're writing web servers and such, that's handled in the server middleware and so all your code can be `suspend fun blahblah()`

Aardwolf

0 replies

4h39m

2024-03-25 13:47:16 UTC

Yeah, fully agree with that. To me this is akin to writing line numbers in old BASIC dialects. The language makes the user do something that ought to be automated.

OtomotO

7 replies

11h33m

2024-03-25 06:53:09 UTC

"Smart programmers try to avoid complexity. So, they see the extra complexity in async/await and question why it is needed. This question is especially pertinent when considering that a reasonable alternative exists in OS threads."

Hm, yes, but is that complexity really avoided if it's in the language/runtime?

Sure, it's not in your code and it will probably have way more extensive testing and maybe also people that thought about the problem for months and you're just trying to implement Business feature #31514 and not reinventing the wheel, but as someone who has been bitten by specific implementations like the one in Quarkus (not a language, granted), I must say:

Complexity is there to stay (and occasionally bite you), even if it's hidden!

delusional

4 replies

11h26m

2024-03-25 07:00:13 UTC

I tend to agree, but an important note is that I've never seen an async/await system that didn't ALSO interact with OS threads. Async await is not an alternative to OS threads but rather an additional layer.

OtomotO

1 replies

11h9m

2024-03-25 07:17:29 UTC

I have heard about them in embedded systems where there are no OS threads, because there is no OS, but there is Async tasks and a scheduler

mike_hearn

0 replies

7h59m

2024-03-25 10:26:58 UTC

That just means there's an OS but it's one a bit like Win 3.1, with fully cooperative multitasking.

IshKebab

1 replies

10h49m

2024-03-25 07:36:37 UTC

You mean like the async/await systems in the two most popular languages in the world - JavaScript and Python?

Async/await is definitely an alternative to user space threads, especially for IO bound tasks.

WorldMaker

0 replies

3h26m

2024-03-25 15:00:18 UTC

Some of the Python schedulers today already use pools of OS Threads. There's definitely complications involving the GIL that makes Python ugly in true multi-threading, but Python has done a lot to mitigate it in the last few years and the rumors are GIL removal will happen sooner than a lot of people expect.

grumpyprole

1 replies

10h54m

2024-03-25 07:32:00 UTC

Hm, yes, but is that complexity really avoided if it's in the language/runtime?

Complexity can be abstracted away and this is still a worthwhile goal. Otherwise, we'd all still be studying Intel CPU documentation.

OtomotO

0 replies

10h41m

2024-03-25 07:45:17 UTC

It can and it should, but I disagree with the sentiment that when it's abstracted "away" it's not there anymore.

You just don't see it.

Abstraction doesn't change any fact about how the system actually works.

pornel

6 replies

4h13m

2024-03-25 14:12:35 UTC

The big one not mentioned is cancellation. It's very easy to cancel any future. OTOH cancellation with threads is a messy whack-a-mole problem, and a forced thread abort can't be reliable due to a risk of leaving locks locked.

In Rust's async model, it's possible to add timeouts to all futures externally. You don't need every leaf I/O function to support a timeout option, and you don't need to pass that timeout through the entire call stack.

Combined with use of Drop guards (which is Rust's best practice for managing in-progress state), it makes cancellation of even large and complex operations easy and reliable.

jeffbee

2 replies

3h45m

2024-03-25 14:40:50 UTC

Cancellation is not worth worrying over in my experience. If an op is no longer useful, then it is good enough if that information eventually becomes visible to whatever function is invoked on behalf of the op, but don't bother checking for it unless you are about to do something very expensive, like start an RPC.

pornel

1 replies

3h29m

2024-03-25 14:57:30 UTC

It's been incredibly important for me in both high-traffic network back-ends, as well as in GUI apps.

When writing complex servers it's very problematic to have requests piling up waiting on something unresponsive (some other API, microservice, borked DNS, database having a bad time, etc.). Sometimes clients can be stuck waiting forever, eventually causing the server to run out of file descriptors or RAM. Everything needs timeouts and circuit breakers.

Poorly implemented cancellation that leaves some work running can create pathological situations that eat all CPU and RAM. If some data takes too long to retrieve, and you time out the request without stopping processing, the client will retry, asking for that huge slow thing again, piling up another and another and another huge task that doesn't get cancelled, making the problem worse with each retry.

Often threading is mixed with callbacks for returning results. The un-cancelled callbacks firing after the other part of the application aborted an operation can cause race conditions, by messing up some state or being misattributed to another operation.

jeffbee

0 replies

3h23m

2024-03-25 15:02:51 UTC

Right, this is compatible with what I said and meant. Timeouts that fire while the op is asleep, waiting on something: good, practical to implement. Cancellations that try to stop an op that's running on the CPU: hard, not useful.

aidenn0

1 replies

1h36m

2024-03-25 16:50:05 UTC

If you implemented async futures, you could have also instead implemented cancelable threads. The problem is fairly isomorphic. System calls are hard, but if you make an identical system call in a thread or an async future, then you have exactly the same cancellation problem.

pornel

0 replies

44m

2024-03-25 17:41:36 UTC

I don't get your distinction. Async/await is just a syntax sugar on top of standard syscalls and design patterns, so of course it's possible to reimplement it without the syntax sugar.

But when you have a standard futures API and a run-time, you don't have to reinvent it yourself, plus you get a standard interface for composing tasks, instead of each project and library handling completion, cancellation, and timeouts in its own way.

Matthias247

0 replies

2024-03-25 18:23:17 UTC

It's not easy to cancel any future. It's easy to *pretend* to cancel any future. E.g. if you cancel (drop) anything that uses spawn_blocking, it will just continue to run in the background without you being aware of it. If you cancel any async fs operation that is implemented in terms of a threadpool, it will also continue to run.

This all can lead to very hard to understand bugs - e.g. "why does my service fail because a file is still in use, while I'm sure nothing uses the file anymore"

plugin-baby

6 replies

12h16m

2024-03-25 06:10:23 UTC

is the difference between threads and async/await more than syntax? or language-specific?

jokethrowaway

2 replies

10h20m

2024-03-25 08:06:07 UTC

Await-async is implemented with a runtime which uses a thread pool or a single thread and allocates work when needed on any thread and waits for IO to yield a result.

With threads you just fully control what blocking code is running on a single thread.

If you are just running computations (or reading files, as filesystem api are not async) it's simpler to just use threads.

plugin-baby

1 replies

7h37m

2024-03-25 10:49:13 UTC

or reading files, as filesystem api are not async

This is interesting - all nodejs file system APIs are async by default.

zaphar

0 replies

3h20m

2024-03-25 15:05:56 UTC

That's because javascript as a whole is async by default. Which really makes me confused as to why javascript didn't just go the route of Go and eliminate the distinction instead of falsely creating it with syntax.

immibis

1 replies

8h19m

2024-03-25 10:06:57 UTC

Implementation difference. Threads are usually handled by the kernel and each one has its own stack where you can put anything. And the kernel can switch threads at any time. Async/await has the compiler work out precisely what has to be saved across the marked locations which are the only places context switches can happen. Also it doesn't tell the kernel when it switches context.

gpderetta

0 replies

5h33m

2024-03-25 12:53:16 UTC

Threads can be in scheduled in userspace and can also be non-preemptive or have deterministic scheduling.

romanovcode

0 replies

12h8m

2024-03-25 06:18:16 UTC

Yes, they are different in most cases.

In general await job is to pass the process to 3rd party e.g. database or http and wait for callback whereas thread job is to launch multiple CPU operations in parallel.

coolThingsFirst

5 replies

12h7m

2024-03-25 06:18:34 UTC

How is async/await implemented under the hood?

lmm

3 replies

11h48m

2024-03-25 06:38:09 UTC

Generally via continuations. An async function is transformed to continuation-passing style, await calls with the current continuation, and then you have a runtime that at its simplest is just, like, a queue of tasks and has special-cased primitives for doing things like async I/O where you suspend, and it just pulls tasks off the queue and runs them, and when one task suspends it stores the continuation and runs the next one.

politelemon

1 replies

11h2m

2024-03-25 07:24:00 UTC

Does that runtime run the tasks across multiple cores?

amaranth

0 replies

9h20m

2024-03-25 09:06:08 UTC

In Rust the answer is "it depends". Since the runtime is not provided by the language you can have implementations that are a single thread, thread-per-task, a thread pool, or whatever other setup you can think of. Tokio at least offers a single threaded version and a thread pool version.

simonask

0 replies

7h32m

2024-03-25 10:54:22 UTC

Async/await in Rust is famously not based on continuations, at least not in any traditional sense, where a block of code is passed to a reactor system to be invoked whenever an operation completes.

Instead it is based on "wakers", which are objects associated with a task, and can be used to notify the task's executor that the task is ready to make progress. It is then the job of an executor resume the task. So there is an extra layer of indirection (conceptually).

There are pros and cons, but in essence the system trades a check on resume (often redundant) for the need to make a heap allocation and/or type erasure at each await point.

(It's possible to avoid the latter in continuation-based implementations, like C++ coroutines, but it's pretty hard.)

LtWorf

0 replies

11h36m

2024-03-25 06:50:23 UTC

poll()

jayd16

4 replies

10h34m

2024-03-25 07:52:01 UTC

Another discussion where people don't get async/await, can't fathom why you would want a concurrency mechanism on a single thread and assume no one needs it.

UI programming, communication with the GPU, and cross runtime communication are good examples but I'm sure there are more.

Threads, green or otherwise, don't work for those cases but async/await does.

mike_hearn

2 replies

8h1m

2024-03-25 10:24:39 UTC

You can easily use threads with GUIs and I've written a bunch of GUI apps in the past that exploited threads quite effectively.

jayd16

1 replies

3h26m

2024-03-25 14:59:37 UTC

What popular GUI framework is multi threaded?

mike_hearn

0 replies

1h20m

2024-03-25 17:06:15 UTC

Depends what you mean by multi-threaded. There are plenty of frameworks that have some support for doing work on background threads.

Take JavaFX for example:

1. Rendering is done on a background thread in parallel with the app thread.

2. The class lib makes it easy to run background tasks that update the GUI to reflect progress and results.

3. You can construct widget trees on background threads and then pass them to the main UI thread only to attach them/make them visible.

With libs like ReactFX you can take event streams from the UI, map over them to a worker thread or pool of threads, and pass the results back to the UI thread.

A few years ago I did a video tutorial showing a few of these things, the code can be seen here:

https://github.com/mikehearn/KotlinFPWebinar/blob/master/src...

Basically this is a simple auto-complete widget. It takes a stream of values from a text field the user is typing in, reduces the frequency of changes (so if you type fast it doesn't constantly do work it's about to throw away) and then passes the heavy work of computing completions onto a background thread, then takes the results back onto the UI thread.

    val edits = EventStreams.valuesOf(edit.textProperty()).forgetful()
    val fewerEdits = edits.successionEnds(Duration.ofMillis(100)).filter { it.isNotBlank() }

    val inBackground = fewerEdits.threadBridgeFromFx(thread)
    val completions = inBackground.map { ngrams.complete(it) }
    val inForeground = completions.threadBridgeToFx(thread)

    val lastResult = inForeground.map {
        FXCollections.observableList(it)
    }.toBinding(FXCollections.emptyObservableList())
    completionsList.itemsProperty().bind(lastResult)

Now if you mean what UI libraries support writes to properties from arbitrary background threads without an explicit thread switch, I think the answer is only Win32 and Motif. But that's not a particularly important feature. JavaFX can catch the case if you access live UI from the wrong thread so such bugs are rarely an issue in practice. In particular it's easy to synchronize with the UI thread with a few simple helpers, e.g.:

    thread(name = "background") {
      while (true) {
        var speed = synchronizeWithUI { speedSlider.value }
        doSomeWork(speed)
      }
    }

anonymoushn

0 replies

7h56m

2024-03-25 10:30:11 UTC

It's definitely important to be able to manage a bunch of tasks explicitly in a single thread. If there are any other possible implementations of language features that result in the same binary and cause less trouble for the language's users, let's chat about those :)

bsaul

4 replies

8h40m

2024-03-25 09:45:41 UTC

"Obviously, the embryonic web tried to solve this problem. The original solution was to introduce threading"

Are we calling 1970's mainframes "the embryonic web" now ?

immibis

2 replies

8h31m

2024-03-25 09:54:36 UTC

Well, apparently they did consist of a lot of tangled threads...

DaiPlusPlus

1 replies

7h36m

2024-03-25 10:50:15 UTC

They’re beautiful: https://www.embecosm.com/app/uploads/Cray_1_Wiring.jpg

littlestymaar

0 replies

2h25m

2024-03-25 16:00:42 UTC

Spiders would be proud of the people who did this. They definitely deserve to be called “web”.

mike_hearn

0 replies

8h25m

2024-03-25 10:01:05 UTC

That sentence is super confusing. The web has never supported multi-threading.

andrewstuart

4 replies

12h29m

2024-03-25 05:56:53 UTC

> A common refrain is that threads can do everything that async/await can, but simpler.

Who says that? Threads and async/await are different things, and it doesn’t make sense to say one can do what the other does. And threads definitely are not simpler than AA

OtomotO

3 replies

11h50m

2024-03-25 06:36:09 UTC

Threads are easier to spawn and that's where all the fuzz comes from, I argue.

Especially in Rust, Async isn't as easy as in other languages with a runtime and it does indeed have some caveats (e.g. cancellation), but all the real fuzz comes from not understanding that they are different strategies for some similar but not the same problems.

It makes little to no sense to use Async/Await for number crunching/CPU-compute intensive tasks, for example.

One can use Threads for some IO waiting though, but it's definitely not the best solution for that particular problem.

To me this whole discussion has two facets:

1) How can Async/Await be more ergonomic in Rust 2) How can we teach people that Async/Await is a different solution with different tradeoffs to Threads - there is a reason why Async/Await was created AFTER WE ALREADY HAD INVENTED THREADS!

dboreham

1 replies

11h30m

2024-03-25 06:55:33 UTC

Async wasn't invented after threads. It was primarily popularized in a system that has been designed such that it couldn't use threads (the web browser). Everything else is post-justifcation for why it's better. It isn't better.

OtomotO

0 replies

11h10m

2024-03-25 07:15:49 UTC

Async/Await as a syntax thing wasn't, but Async/Await as "don't just blindly use Threads for scaling to a myriad of incoming HTTP requests" was.

I remember the Apache webserver story and that one has little to nothing to do with Webbrowsers or JavaScript ;)

nurple

0 replies

10h29m

2024-03-25 07:57:07 UTC

My problem with rust's async/await is that it's _not_ a different strategy as the continuation tasks _are_ run multithreaded, so it's technically both strategies. IMO one of the biggest selling points of single-threaded async/await was how much complexity falls away compared to managing preemptive synchronization in the multi-threaded case.

I can see why there's so much controversy over async/await in rust. If I had to take both the syntax and cognitive hit of using async/await _and_ multi-threading, I would also angrily call for its removal.

unstruktured

1 replies

3h13m

2024-03-25 15:13:29 UTC

To me, implementing an async api using algebraic effects is the end game. Gets rid of the need of an async keyword and no monads required!

mattgreenrocks

0 replies

3h4m

2024-03-25 15:22:31 UTC

Async/await makes more sense in some languages than others.

For most PLs, algebraic effects look ideal. Just seems like we're figuring out how to talk about them to users and what the UX should be.

the__alchemist

1 replies

3h30m

2024-03-25 14:56:24 UTC

I am suspicious that Async, slowly taking over the library ecosystems, will be what drives me away from Rust. Over the past year, it's taken over embedded. I was previously worried about excessive use of generics, but this was, gradually over the past ~2 years replaced with Async.

The std-lib HTTP ecosystem has already gone almost full-async. I was able to spin up a simple web server without it using the rouille library recently. I chose this one because it was one of two options that wasn't Async.

I think I may be the only person who A: Enjoys rust for embedded targets (Eg Cortex-M), but doesn't enjoy Async. The easy path is to go with the flow and embrace Async in those domains (Embedded, web backends). I say: No. Threads, interrupts, DMA, etc are easier to write and maintain, and critically, don't split code into two flavors. You don't just add an Async library to your program: It attempts to take your program over. That is the dealbreaker.

littlestymaar

0 replies

3h14m

2024-03-25 15:11:34 UTC

The easy path is to go with the flow and embrace Async in those domains (Embedded, web backends). I say: No.

Why are you making a principle of not using async though? It's just one particular feature of the language, there's nothing wrong in not being particularly enthusiast about it, but entirely refusing to use it sounds really weird to me.

Every JavaScript junior use async right now, it's not as it is was a particularly difficult programming paradigm to learn. Sure async in rust is a bit more complicated than in JS due to the fact it's lower level, but it's not worse than anything else in Rust!

Threads, interrupts, DMA, etc are easier to write and maintain,

Not really. I've no doubt that you master them better, but there's no reason to think you won't be able to be as proficient with async. And in fact, async is just strictly more powerful than sync code: you can do exactly what you're doing right now (just write await), plus you get a few features for free (concurrency, cancelation). There's a reason why it's popular actually!

and critically, don't split code into two flavors

In practice, there isn't either. No if you're writing an application, it's only when you're writing a library and you want to support explicitly the use case of people like you that have moral issues with async.

stevefan1999

1 replies

11h29m

2024-03-25 06:57:10 UTC

Because you want to avoid context switching cost. In fact, async/await/promise/future/Task is also closely related to fiber as opposed to thread at least in Windows, but fiber is provided from an OS level while async/promise/future/Task is provided by the language itself. You can switch to a different fiber without doing a context switch, just like how you use a state machine to switch to different job using async/promise/future/Task

immibis

0 replies

8h23m

2024-03-25 10:02:39 UTC

Context switching cost is mostly artificial. You save kernel/user transitions, but you can also have user-mode threads without transitions. What else are you saving by using coroutines instead of threads? You're not saving register swapping or cache warmup. You are saving stack allocations but those don't cost time.

paulsutter

1 replies

8h39m

2024-03-25 09:47:21 UTC

If you are doing anything complex you really want to use a polling loop. That’s how video games, avionics systems, industrial equipment, etc is programmed

guappa

0 replies

8h27m

2024-03-25 09:58:47 UTC

That's what async/await abstracts internally.

neonsunset

1 replies

7h43m

2024-03-25 10:42:48 UTC

Please. Stop posting about async/await here. It just gives space for everyone and their mother to post their superstitions and misconceptions about what is otherwise an elegant and extremely powerful model.

The Rust implementation is great, so is the C# one. They have their own tradeoffs but I would never choose anything else, and 8 out of 10 developers who do disagree never stopped reading past "me sees await means no thread blocky" instead of focusing on structured concurrency patterns it enables. Hell, C# does not even need any fancy terms for this because it is this low ceremony. Worse languages, however, do require more effort so have to justify it with inventing words.

sapiogram

0 replies

7h20m

2024-03-25 11:06:23 UTC

Please. Stop posting about async/await here.

I agree with everything else you said, but that's all the more reason to post it imo. The misconceptions aren't going away just because you don't hear about them, and every time I hear the different arguments, my own understanding grows a little bit. It's annoying, but quite healthy.

makerdiety

1 replies

6h17m

2024-03-25 12:08:52 UTC

Because asynchronous programming reduces to its equivalent in digital circuit design being contingent on analog circuit engineering decisions.

runiq

0 replies

6h1m

2024-03-25 12:24:38 UTC

Sorry, not a native speaker here and I can't parse that sentence. Would you be so kind to dumb it down a little?

ggm

1 replies

12h15m

2024-03-25 06:11:07 UTC

As one who struggled with thread safe storage since threads started, the amount of code we carry around with us which looks ok, but turns out not to be viable in threads is remarkably high. I bump into this in C, Python3 quite a lot: you have to work harder to do anything which is not synchronous, no matter how you arrive in that place.

For long lived work, it is not impossible there is no advantage overall to forking a lot of heavyweight processes which operate fast as a single execution state. If you have the cores, the CPU and the memory. Context switching delay is very possibly not your main problem.

For example, I have typically 24 1 hour capture/log files, each 300m+ lines big and I need to ungzip them, parse/grep out some stuff, calculate some aggregates and then recombine. Its a natural pipeline of 3 or 4 processes. The effort to code this into a single language and uplift it to run threads inside, where it's basically 24 forked pipe sequences begs questions: What exactly is going to get faster, when I am probably resource limited by the gunzip process at the front?

You think you can code faster than a grep DFA to select inputs? How sure are you that memory structure is faster than awk? Really sure? I tested some. radix tries are nice and have that log(n) length thing, but AWK was as fast. (IP address "hashing" lookup costs)

(hint: more than one interpreted language simply forks gzip for you, to unzip an input stream)

If you can go to C, then it's possible forking heavyweight processes in C, with good IPC buffer selections or mmap is "good enough"

pjmlp

0 replies

11h19m

2024-03-25 07:07:20 UTC

As someone that was big into threading back in the 2000, with the experience gathered through times, I think with modern hardware resources we are better off with OS IPC, as it offers much better safety guarantees, specially in C like languages.

feverzsj

1 replies

11h29m

2024-03-25 06:56:38 UTC

Rust async/await is as bad as C++ coroutines. They lack one of the most important language counterparts: async drop/destructor.

OtomotO

0 replies

11h27m

2024-03-25 06:59:17 UTC

That's the main gripe imo: Missing features that are not optional for certain problems where the solution would otherwise make perfect sense.

I get the pragmatism to "better ship something that is 80% finished now, than wait for it to be 100% finished in some years", but with Rust's async/await it was released in 2018 and the more time passes, the more it looks like some sharp edges are here to stay.

EVa5I7bHFq9mnYK

1 replies

7h33m

2024-03-25 10:53:27 UTC

Threads are not useful for I/O bound code. Anyway the requests are getting serialized into a single queue within the network driver or a disk driver. Actually, they make things worse by 1) issuing multiple out-of-order request to the disk, and 2) wasting time and memory switching that thread context back and forth. For CPU-bound tasks, it makes sense to create multiple threads, but only up to amount of actual cores available, otherwise it makes things worse again by cache thrashing. Thread pool(s) + message queues is what worked best for me.

gpderetta

0 replies

5h44m

2024-03-25 12:42:04 UTC

Any semi-modern card or ssd provides multiqueue support in hardware.

And even with a single queue, concurrency can help keeping a queue full.

trevor-e

0 replies

2h2m

2024-03-25 16:24:31 UTC

As an observer who has used async/await in other languages (Javascript, Swift) I'm really confused why this is such a contentious topic in Rust. swift-concurrency has had its issues as well, but almost everyone I know finds it way more ergonomic and useful than how things were done before. As someone currently learning Rust, is there something particular about the language compared to how Swift did it? Outside of HN I would not know this is such a controversial topic.

simpaticoder

0 replies

7h26m

2024-03-25 11:00:16 UTC

Async/await vs threads is yet another entry in the ~800 volume ongoing series "Where the Indirection Go?" In this case, you put concurrency inside the process (async) or outside the process (threads). Your CPU is, in both cases, constantly rotating between workloads, and that indirection can be inside or outside the process. Once you start caring about either human usability or fractional performance, the distinction matters. Otherwise, it doesn't.

shmerl

0 replies

2h18m

2024-03-25 16:07:46 UTC

I always understood async/await as expression of concurrency and threads as expression of parallelism. Which aren't the same thing.

Concurrency breaks computation into chunks that can be interleaved even on a single CPU advancing computation concurrently overall but not necessarily physically at the same time for each chunk.

Parallelism on the other hand breaks computation into chunks that literally can run in parallel on different CPUs.

You can even combine both in some fashion.

samsquire

0 replies

7h58m

2024-03-25 10:28:10 UTC

Thank you for this post, very interesting.

Parallelism and async and multithreading is my hobby.

I think nodejs and browsers made typeless but coloured async easily digested and understood by everyone. Promises for example.

The libuv event loop used by nodejs is powerful for single threaded IO scalability, but I am looking for something that is multicore and scalable.

I like thread-per-core architectures.

The code written to an architecture has a high overhead in terms of mental understandability to new colleagues and barrier to transform it. Refactoring Postgres multiprocess implementation or nginx's worker model, it would be a major effort and undertaking. Firefox did a lot of work with Electrolysis Quantum to make it multiprocess.

I was yesterday and recently trying to implement a lockfree work queue that any thread can pick up work and "steal" work from other threads.

I have a multithreaded runtime I am building in C which is a "phaser", it does synchronization in bulk. So rather than mutexes, you have mailboxes which are queues.

nottorp

0 replies

9h17m

2024-03-25 09:08:39 UTC

No offense but you don't serve 1 million clients with 1 thread per client.

So what this ends up as is describing the mechanism that Rust uses to hide the state machine you'd do by hand in an older language.

naasking

0 replies

4h5m

2024-03-25 14:21:17 UTC

Async/await have one property that is slightly more difficult to achieve with threads, but isn't really discussed much and hasn't been used much to my knowledge: because context is captured fairly explicitly as records, this makes it easy to persist that context to durable storage and/or migrate them to other machines without any additional runtime machinery.

Achieving this with threads is possible but requires additional runtime support. Basically it requires scanning stack frames, which the garbage collector already does, so you're extending the GC to record stack in a separate place.

mrkeen

0 replies

11h14m

2024-03-25 07:11:44 UTC

A common refrain is that threads can do everything that async/await can, but simpler.

OS threads don’t require any changes to the programming model, which makes it very easy to express concurrency.

I think these claims need a bit of justification, or else why write the article?

jupp0r

0 replies

2h29m

2024-03-25 15:56:57 UTC

One of the main benefits of async/await in Rust is that it can work in situations where you don't even have threads or dynamic memory. You can absolutely use it to write very concise code that's waiting on an interrupt on your microcontroller to have read some data coming in over I2C from some buffer. It's a higher level abstraction that allows your code to use concurrency (mostly) without having tons of interactions with the underlying runtime.

Every major piece of software that I have worked on has implemented this in one form or another (even in non-modern C++ where you don't have any coroutine concepts, Apple's grand central dispatch, intel's Thread Building Blocks, etc). If you don't then your business logic will either be very imperformantly block on IO, have a gazillion of threads that make development/debugging a living hell, or be littered with implementation details of the underlying runtime or a combination of all 3.

If you don't use existing abstractions in the language (or through some library), you will end up building them yourselves, which is hard and probably overall inferior to widely used ones (if there are any). I have done so in the past, see https://github.com/goto-opensource/asyncly for C++.

gudzpoz

0 replies

11h11m

2024-03-25 07:15:14 UTC

I think the author is confusing two things here:

1. User-space threads / Green threads

2. Structured concurrency

The former one is an advantage of async/await, but is not unique to it (see Go or Java Loom for examples that involves no function coloring problem). And the latter one can be implemented with both OS threads and green threads (see Structured concurrency JEPS for Java [1]).

[1] https://openjdk.org/jeps/462

globular-toast

0 replies

10h6m

2024-03-25 08:20:21 UTC

I recall that the thing that made me stick with Rust is the Iterator trait. It blew my mind that you could make something an Iterator, apply a handful of different combinators, then pass the resulting Iterator into any function that took an Iterator.

What languages did the author use before? Is this any different from the interface pattern seen in many other languages?

germandiago

0 replies

4h57m

2024-03-25 13:28:38 UTC

Iterators blew the writer's mind... Most languages have iterators, come on... Java, C#, Python, C++, only to name a few

dragonwriter

0 replies

1h36m

2024-03-25 16:49:34 UTC

A common refrain is that threads can do everything that async/await can, but simpler

This is the first time I've heard that, and given how many languages have added async/await after threading support, I don't think it is really a broad consensus.

cryptonector

0 replies

3h49m

2024-03-25 14:36:58 UTC

Why choose async/await over threads?

I assume the intended question is: why use async/await over thread-per-client designs?

Easy: because async/await allows you to more easily compress the memory footprint of client/request/work state because there is no large stack to allocate to each client/whatever.

One can still use threads (or processes) with async/await so as to use as many CPUs as possible while still benefiting from the state compression mentioned above.

State compression -or, rather, not exploding the state- is critical for performance considering how slow memory is nowadays.

Async/await, manual continuation passing style (CPS) -- these are great techniques for keeping per-client memory footprint way down. Green threads with small stacks that don't require guard pages to grow are less efficient, but still more efficient than threads.

Threads are inefficient because of the need to allocate them a large stack, and to play MMU games to set up those stacks and their guard pages.

Thread-per-client simply does not scale like async/await and CPS.

assbuttbuttass

0 replies

4h28m

2024-03-25 13:58:05 UTC

I have to say, I am not convinced by this article that async composes better. The nice thing about green threads/fibers is you can make concurrency an internal detail: a function might spawn threads internally, or block, but the caller is free to use it as any other normal function, including passing it to a map() or filter() combinator.

By contrast, async forces the caller to acknowledge that it's not a regular function, and async functions don't compose at all with normal code. You have to write async-only versions of map() filter() and any other combinators.

Maybe async composes better with other async, but with threads, you can just compose with any other existing code.

ardies

0 replies

8h57m

2024-03-25 09:29:24 UTC

Just tried changing my python code to use AA instead of the 2 threads I use and it just complicated the code more. I had to create a thread anyway to deal with the tkinter event loop issue. And then I still have to check inside the functions if the task is still active. But the worst part is converting all the stack of functions to to be async/await.

anybodyz

0 replies

2h0m

2024-03-25 16:25:58 UTC

I worked with a fairly large code base which leveraged threads to avoid having to use callbacks in a C++ code base. This allowed the engineers to use the more familiar linear programming style to make blocking network calls which would stall one thread, while others were unblocked to proceed as responses were received. The threads would interoperate with each other using a similar blocking rpc system.

But what ended up happening was that the system would tend to block unnecessarily across all threads, particularly in cross thread communication. These were due to many external network calls being performed sequentially when they could have been performed in parallel, and likewise cross-thread communication constantly blocking due to sequential calls. The end result was a system who'd performance was just fundamentally hobbled by threads that were constantly waiting on other threads or series of external calls, and it was very difficult to understand the nature of the gordian knot of blocking behavior that was causing these issues.

The main problem with using threads is that the moment you introduce a different cpu into the mix you need to deal with synchronization primitives around your state, and this means that engineers will use fewer threads (at least in our case) than necessary to reduce the complexity of the synchronization work needed to be done which means that you lose the advantage of asynchronous parallelism. Or at least that is what happened in this particular case.

The cost of engineering synchronization for async/await is zero, because this parallelism happens on a single thread. Since the cpu "work" to be done for async/io is relatively small, this argues for using single threaded 'callback' style solutions where you maximize the amount of parallelism and decrease the amount of potential blocking as well as minimizing the complexity of thread synchronization as much as possible. In cases where you want to leverage as many cpu's as possible, it's often the case that you can better benefit from cpu parallelism by simply forking your process on multiple cores.

alexxys

0 replies

1h4m

2024-03-25 17:22:14 UTC

The article talks a lot about async/await but fails to clearly state the main advantage of async code over threads. Async code in general (not only in Rust) allows a server to process thousands of client connections concurrently with minimal latency in a single thread. Even if each client request needs several seconds to process it (assuming the processing is IO-bound). One thread (or more generally, a small number of threads) is much cheaper resource-wise than thousands of threads (in a thread per client scenario).

aidenn0

0 replies

1h41m

2024-03-25 16:45:16 UTC

ELI5: If you have to modify the language anyways (to add async/await) why not just have green threads that wrap NBIO with seemingly blocking calls?

KaiserPro

0 replies

10h3m

2024-03-25 08:23:05 UTC

I think its partly down to what you grew up with. If you're of the JS (or python 3) generation then you're _probably_ more comfortable with async.

However there are things where async seems to be more of a semantic fit for what youre doing.

for example FastAPI is all async, and it makes sense. I started using it, because it scaled better than anything else. They have don't a nice job of making the interface as painless as possible. It almost doesn't feel like surprise goto.

I do a lot of stream processing, so for me threads is a better fit. It scales well enough, and should I need to either go to multiprocessing (not great) or duplicate to a new stand-alone process, its fairly simple (Keeping everything message based also helps.)

async/threads is _almost_ like shop bought coke vs sodastream. They are mostly the same, but have slightly different semantics.