Issues with the article:
1. Only one example is given (web server), solved incorrectly for threads. I will elaborate below.
2. The question is framed as if people specifically want OS threads instead of async/await .
But programmers want threads conceptually, semantically. Write sequential logic and don't use strange annotations like "async". In other words, if async/await is so good, why not make all functions in the language implicitly async, and instead of "await" just write normal function calls? Then you will suddenly be programming in threads.
OS threads are expensive due to statically allocated stack, and we don't want that. We want cheap threads, that can be run in millions on a single CPU. But without the clumsy "async/await" words. (The `wait` word remains in it's classic sense: when you wait for an event, for another thread to complete, etc - a blocking operation of waiting. But we don't want it for function invocations).
Back to #1 - the web server example.
When timeout is implemented in async/await variant of the solution, using the `driver.race(timeout).await`, what happens to the client socket after the `race` signals the timeout error? Does socket remain open, remains connected to the client - essentially leaked?
The timeout solution for threaded version may look almost the same, as it looks for async/await: `threaded_race(client_thread, timeout).wait`. This threaded_race function uses a timer to track a timeout in parallel with the thread, and when the timeout is reached it calls `client_thread.interrupt()` - the Java way. (The `Thread.interrupt()`, if thread is not blocked, simply sets a flag; and if the thread is blocked in an IO call, this call throws an InterruptedException. That's a checked exception, so compiler forces programmer to wrap the `client.read_to_end(&mut data)` into try / catch or declare the exception in the `handle_client`. So programmer will not forget to close the client socket).
Some programmers do, but many want exactly the opposite as well. Most of the time I don't care if it's an OS blocking syscall or a non-blocking one, but I do care about understanding the control flow of the program I'm reading and see where there's waiting time and how to make them run concurrently.
In fact, I'd kill to have a blocking/block keyword pair whenever I'm working with blocking functions, because they can surreptitiously slow down everything without you paying attention (I can't count how many pieces of software I've seen with blocking syscalls in the UI thread, leading to frustratingly slow apps!).
But all functions are blocking.
Here bar is a blocking function.No they aren't, and that's exactly my point.
Most functions aren't doing any syscall at all, and as such they aren't either blocking or non-blocking.
Now because of path dependency and because we've been using blocking functions like regular functions, we're accustomed to think that blocking is “normal”, but that's actually a source of bugs as I mentioned before. In reality, async functions are more “normal” than regular functions: they don't do anything fancy, they just return a value when you call them, and what they return is a future/promise. In fact you don't even need to use any async anotation for a function to be async in Rust, this is an async function:
The async keyword exists simply so that the compiler knows it has to desugar the await inside the function into a state machine. But since Rust has async blocks it doesn't even need async on functions at all, the information you need comes from the type of the return value, that is a future.Blocking functions, on the contrary, are utterly bizarre. In fact, you cannot make one yourself, you must either call another blocking function[1] or do a system call on your own using inline assembly. Blocking functions are the anomaly, but many people miss that because they've lived with them long enough to accept them as normal.
[1] because blockingness is contagious, unlike asynchronousness which must be propagated manually, yes ironically people criticizing async/await get this one backward too
bar blocks waiting for the CPU to add the numbers.
Nope it doesn't, in the final binary the bar function doesn't even exist anymore, as the optimizer inlined it, and CPUs have been using pipelining and speculative execution for decades now, they don't block on single instruction. That's the problem with abstractions designed in the 70s, they don't map well with the actual hardware we have 50 years after…
Make `a + b` `A * B` then, multiplication of two potentially huge matrices. Same argument still holds, but now it's blocking (still just performing addition, only an enormous number of times).
It's not blocking, it's doing actual work.
Blocking is the way used by the old programming paradigm to deal with asynchronous actions, and it works by behaving the same way as when the computer actually computes thing, so that's where the confusion comes from. but the two situations are conceptually very different: in one case, we are idle (but don't see it), in another case we're busy doing actual work. Maybe in case 2. we could optimize the algorithm so that we spend more time, but that's not sure, whereas in case 1. there's something obvious to do to speed things up: do something at the same time instead of waiting mindlessly. Having a function marked async gives you a pointer that you can actually run it concurrently to something else and expect speed up, whereas with blocking syscall there's no indication in the code that those two functions you're calling next to each other with not data dependency between them would gain a lot to be run concurrently by spawning two threads.
BTW, if you want something that's more akin to blocking, but at a lower level, it's when the CPU has to load data from RAM: it's really blocked doing nothing useful. Unfortunately that's not something you can make explicit in high-level languages (or at least, the design space hasn't been explored) so when these kinds of behavior matters to you, that's when you dive to assembly.
A "non-blocking function" always meant "this function will return before its work is done, and will finish that work in the background through threads/other processes/etc". All other functions are blocking by default, including that simple addition "bar" function above.
Your definition is at odds with (for instance) JavaScript's implementation of a non-blocking function though, as it will perform computation until the first await point before returning (unlike Rust future which are lazy, that is: do no work before they are awaited on).
As I said before, most of what you call a “blocking function” is actually a “no opinion function” but since in the idiosyncrasy of most programming languages blocking functions are being called like “no opinion” ones, you are mixing them up. But it's not a fundamental rule. You could imagine a language where blocking functions (which contains an underlying blocking syscall) are being called with the block keyword and where regular functions are just called like functions. There's no relation between regular functions and blocking functions except path dependency that led to this particular idiosyncrasy we live in, it is entirely contingent.
Yes, that's syntactic sugar for returning a promise. This pattern is something we've long called a non-blocking function in Javascript. The first part that's not in the promise is for setting it up.
I don't know what to tell you, but that is how sequential code works. Sure you can find some instruction level parallelism in the code and your optimizer may be able to do it across function boundaries, but that is mostly a happy accident. Meanwhile HDLs are the exact opposite. Parallel by default and you have to build sequential execution yourself. What is needed for both HLS and parallel programming is a parallel by default hybrid language that makes it easy to write both sequential and parallel code.
Except, unless you're using atomics or volatiles, you have no guaranties that the code you're writing sequentially is going to be executed this way…
Sure, unless it is the first time you are executing that line of code and you have to wait for the OS to slowly fault it in across a networked filesystem.
"makes certain syscalls" is a highly unconventional definition of "blocking" that excludes functions that spin wait until they can pop a message from a queue.
If your upcoming systems language uses a capabilities system to prevent the user from inadvertently doing things that may block for a long time like calling open(2) or accessing any memory that is not statically proven to not cause a page fault, I look forward to using it. I hope that these capabilities are designed so that the resulting code is more composable than Rust code. For example it would be nice to be able to use the Reader trait with implementations that source their bytes in various different ways, just as you cannot in Rust.
Blocking syscalls are a well defined and well scoped class of problems, sure there are other situations where the flow stops and a keyword can't save you from everything.
Your reasoning is exactly similar to the folks who say “Rust doesn't solve all bugs” because it “just” solve the memory safety ones.
I may be more serious than you think. Having worked on applications in which blocking for multiple seconds on a "non-blocking syscall" or page fault is not okay, I think it would really be nice to be able to statically ensure that doesn't happen.
I'm not disputing that, in the general case I suspect this is going to be undecidable, and that you'd need careful design to carve out a subset of the problem that is statically addressable (akin to what rust did for memory safety, by restricting the expressiveness of the safe subset of the languages).
For blocking syscalls alone there's not that much PL research to do though and we could get the improvement practically for free, that's why I consider them to be different problems (also because I suspect they are much more prevalent given how much I've encountered them, but it could be a bias on my side).
Any function can block if memory it accesses is swapped out.
Difference is in quantities. bar blocks for nanoseconds, blocking that the GP talks about affects the end user, which means it's in seconds.
This is a really common comment to see on HN threads about async/await vs fibers/virtual threads.
What you're asking for is performance to be represented statically in the type system. "Blocking" is not a useful concept for this. As avodonosov is pointing out, nothing stops a syscall being incredibly fast and for a regular function that doesn't talk to the kernel at all being incredibly slow. The former won't matter for UI responsiveness, the latter will.
This isn't a theoretical concern. Historically a slow class of functions involved reading/writing to the file system, but in some cases now you'll find that doing so is basically free and you'll struggle to keep the storage device saturated without a lot of work on multi-threading. Fast NVMe SSDs like found in enterprise storage products or MacBooks are a good example of this.
There are no languages that reify performance in the type system, partly because it would mean that optimizing a function might break the callers, which doesn't make sense, and partly because the performance of a function can vary wildly depending on the parameters it's given.
Async/await is therefore basically a giant hack and confuses people about performance. It also makes maintenance difficult. If you start with a function that isn't async and suddenly realize that you need to read something from disk during its execution, even if it only happens sometimes or when the user passes in a specific option, you are forced by the runtime to mark the function async which is then viral throughout the codebase, making it a breaking change - and for what? The performance impact of reading the file could easily be lost in the noise on modern hardware and operating systems.
The right way to handle this is the Java approach (by pron, who is posting in this thread). You give the developer threads and make it cheap to have lots of them. Now break down tasks into these cheap threads and let the runtime/OS figure out if it's profitable to release the thread stack or not. They're the best placed to do it because it's a totally dynamic decision that can vary on a case-by-case basis.
You'll typically have an idea of whether or not a function performs IO from the start. Changing that after the fact violates the users' conceptual model and expectation of it, even if all existing code happens to keep working.
I think GP's point is: why does that matter? Much writing on Async/Await roughly correlates IO with "slow". GP rightly points out that "slow" is imprecise, changes, means different things to different people and/or use cases.
I completely get the intuition: "there's lag in the [UI|server|...], what's slowing it down?". But the reality is that trying to formalise "slow" in the type system is nigh on impossible - because "slow" for one use case is perfectly acceptable for another.
While slow in absolute depends on lots of factors, the relative slowness of things doesn't so much. Whatever the app or the device, accessing a register is always going to be faster than random places in RAM, which is always going to be faster than fetching something on disk and even moreso if we talk about fetching stuff over the network. No matter how hardware progresses, latency hierarchy is doomed to stay.
That doesn't mean it's the only factor of slowness, and that async/await solves all issues, but it's a tool that helps, a lot, to fight against very common sources of performance bugs (like how the borrow checker is useful when it protects against the nastiest class of memory vulnerabilities, even if it cannot solve all security issues).
Because the situation where “my program is stupidly waiting for some IO even though I don't even need the result right now and I could do something in the meantime” is something that happens a lot.
The network is special: the time it takes to fetch something over the network can be arbitrarily large, or even infinite (this can also apply to disk when running over networked filesystems), while for registers/RAM/disk (as long as it's a local disk which is not failing) the time it takes is bounded. That's the reason why async/await is so popular when dealing with the network.
PCIe is a network. USB is a network. There is no such thing as a resource with a guaranteed response time.
Even if you ignore performance completely, IO is unreliable. IO is unpredictable. IO should be scrutinized.
There are plenty of language ecosystems where there's no particular expectations up front about whether or when a library will do IO. Consider any library that introduces some sort of config file or registry keys, or an OS where a function that was once purely in-process is extracted to a sandboxed extra process.
There are languages that don't enforce the expectation on a type level, but that doesn't mean that people don't have expectations.
Yeah, please don't do this behind my back. Load during init, and ask for permission first (by making me call something like Config::load() if I want to respect it).
Slightly more reasonable, but this still introduces a lot of considerations that the application developer needs to be aware of (how should the library find its helper binary? what if the sandboxing mechanism fails or isn't available?).
For the sandbox example I was thinking of desktop operating systems where things like file IO can become brokered without apps being aware of it. So the API doesn't change, but the implementation introduces IPC where previously there wasn't any. In practice it works fine.
If you want to go full Haskell on the problem for purity-related reasons, by all means be my guest. I strongly approve.
However, unless you're in such a language, warping my entire architecture around that objection does not provide a good cost-benefit tradeoff. I've got a lot of fish to fry and a lot of them are bigger than this in practice. Heck, there's still plenty of programmers who will consider it as unambiguous feature that they can add IO to anything they want or need to and consider it a huge negative when they can't, and a lot of programmers who don't practice an IO isolation and don't even conceive of "this function is guaranteed to not do any IO/be impure" as a property a function can have.
You can't encode everything about performance in the type system, but that doesn't mean you cannot do it at all: having a type system that allows you to control memory layout and allocation is what makes C++ and Rust faster than most languages. And regarding what you say about storage access: storage bandwidth is now high, but latency when accessing an SSD is still much higher than accessing RAM, and network is even worse. And it will always be the case no matter what progress hardware makes, because of the speed of light.
Saying that async/await doesn't help with all performance issues is like saying Rust doesn't prevent all bugs: the statement is technically correct, but that doesn't make it interesting.
Many developers have embraced the async/await model with delight, because it instead makes maintenance easier by making the intent of the code more explicit.
It's been trendy on HN to bash async/await, but you are missing the mist crucial point about software engineering: code is written for humans and is read much more than written. Async/await may be slightly more tedious to write (it's highly context dependent though, when you have concurrent tasks to execute or need cancellation, it becomes much easier with futures).
No it's not, and Mr Pressler's has repeatedly shown that he misses the social and communication aspects, so it's not entirely surprising.
It has been tried various times in the last decades. You want to search for "RPC". All attempts at trying to unify sync and async have failed, because there is a big semantical difference between running code within a thread or between threads or even between computers. Trying to abstract over that will eventually be insufficient. So better learn how to do it properly from the beginning.
I think you've got some of this in your own reply, but ... I feel like Erlang has gone all in on if async is good, why not make everything async. "Everything" in Erlang is built on top of async messaging passing, or the appearance thereof. Erlang hasn't taken over the world, but I think it's still successful; chat services descended from ejabberd have taken over the world; RabbitMQ seems pretty popular, too. OTOH, the system as a whole only works because Erlang can be effectively preemptive in green threads because of the nature of the language. Another thing to note is that you can build the feeling of synchronous calling by sending a request and immediately waiting for a response, but it's vary hard to go the other way. If you build your RPC system on the basis of synchronous calls, it's going to be painful --- sometimes you want to start many calls and then wait for the responses together, that gets real messy if you have to spawn threads/tasks every time.
I'm not very familiar with Erlang, but from my understanding, Erlang actually does have this very distinction - you either run local code or you interact with other actors. And here the big distinction gets quite clear: once you shoot a message out, you don't know what will happen afterwards. Both you or the other actor might crash and/or send other messages etc.
So Erlang does not try to hide it, instead, it asks the developer to embrace it and it's one of its strength.
That being said, I think that actors are a great way to model a system from the birds-perspective, but it's not so great to handle concurrency within a single actor. I wish Erlang would improve here.
Actors are a building block of concurrency. IMHO, it doesn't make sense to have concurrency within an actor, other than maybe instruction level concurrency. But that's very out of scope of Erlang, BEAM code does compile (JIT) to native code on amd64 and arm64, but the JIT is optimized for speed, since it happens at code load time, it's not an profiling/optimizing JIT like Java's hotspot. There's no register scheduler like you'd need to achieve concurrency, all the beam ops end up using the same registers (more or less), although your processor may be able to do magic with register renaming and out of order operations in general.
If you want instruction level concurrency, you should probably be looking into writing your compute heavy code sections as Native Implemented Functions (NIFs). Let Erlang wrangle your data across the wire, and then manipulate it as you need in C or Rust or assembly.
I think it makes sense to have that, including managing the communication with other actors. Things like "I'll send the message, and if I don't hear back within x minutes, I'll send this other message".
Actors are very powerful and a great tool to have at your disposal, but often they are too powerful for the job and then it can be better to fall back to a more "low level" or "local" type of concurrency management.
At least that's how I feel. In my opinion you need both, and while you can get the job done with just one of them (or even none), it's far from being optimal.
Also, what you mention about NIFs is good for a very specific usecase (high performance / parallelism) but concurrency has a broader scope.
I assume you don't want to wait with a x minute timeout (and meantime not do anything). You can manage this in three ways really:
a) you could spawn an actor to send the message and wait for a response and then take the fallback action.
b) you could keep a list (or other structure, whatever) of outstanding messages and timeouts, and prune the list if you get a response, or otherwise periodically check if there's a timeout to process.
c) set a timer and do the thing when you get the timer expiration message, or cancel the timer if you get a response. (which is conceptually done by sending a message to the timer server actor, which will send you a timer handle immediately and a timer expired message later; there is a timer server you can use through the timer module, but erlang:send_after/[2,3] or erlang:start_timer/[3,4] are more efficient, because the runtime provides a lot of native timer functionality as needed for timeouts and what not anyway)
Setting up something to 'automatically' do something later means asking the question of how is the Actor's state managed concurrently, and the thing that makes Actors simple is by being able to answer that the Actor always does exactly one thing at a time, and that the Actor cannot be interrupted, although it can be killed in an orderly fashion at any time, at least in theory. Sometimes the requirement for an orderly death means it may mean an operation in progress must finish before the process can be killed.
Exactly. Now imagine a) is unessarily powerful. I don't want to manage my own list as in b), but other than that, b) sounds fine and c) is also fine, though, does it need an actor in the background? No.
In other words, having a well built concept for these cases is important. At least that's my take. You might say "I'll just use actors and be fine", but for me it's not sufficient.
Oh and just to add onto it, I think async/await is not really the best solution to tackle these semantic difference. I prefere the green-thread-IO approach, which feels a might more heavy but it leads to a true understanding how to combine and control logic in a concurrent/parallel setting. Async/await is great to add it to languages that already have something like promises and want to improve syntax in an easy way though, so it has its place - but I think it was not the best choice for Rust.
Any internal race() values will be `Drop`ed and driver itself will remain (although rust will complain you are not handling the Result if you type it 'as is'), if a new socket was created local to the future it will be cleaned up.
The niceness of futures (in Rust) is that all the behavior around it can be defined, while "all functions are blocking." as you state in a sibling comment, Rust allows you to specify when to defer execution to the next task in the task queue, meaning it will poll tasks arbitrarily quickly with an explicitly held state (the Future struct). This makes it both very fast (compared to threads which need to sleep() in order to defer) and easy to reason about.
Java's Thread.interrupt is also just a sleep loop, which is fine for most applications to be fair. Rust is a system language, you can't have that in embedded systems, and it's not desirable for kernels or low-latency applications.
You probably mean that Java's socket reading under the hood may start a non-blocking IO operation on the socket, and then run a loop, which can react on Thread.interrupt() (which, in turn, will basically be setting a flag).
But that's an implementation detail, and it does not need to be implemented that way.
It can be implemented the same way as async/await. When a thread calls socket reading, the runtime system will take the current threads continuation off the execution, and use CPU to execute the next task in the queue. (That's how Java's new virtual threads are implemented).
Threads and async/await are basically the same thing.
So why not drop this special word `async`?
You can drop the special word in Rust it's just sugar for 'returns a poll-able function with state'; however threads and async/await are not the same.
You can implement concurrency any way you like, you can run it in separate processes or separate nodes if you are willing to put in the work, that does not mean they equivalent for most purposes.
Threads are almost always implemented preemptively while async is typically cooperative. Threads are heavy/costly in time and memory, while async is almost zero-cost. Threads are handed over to the kernel scheduler, while async is entirely controlled by the program('s executor).
Purely from a merit perspective threads are simply a different trade-off. Just like multi-processing and distributed actor model is.
Keyword here being almost. See Project Loom.
@f_devd, cooperative vs preemptive is a good point.
(That threads are heavy or should be scheduled by OS is not required by the nature of the threads).
But preemptive is strictly better (safer at least) than cooperative, right? Otherwise, one accidental endless loop, and this code occupies the executor, depriving all other futures from execution.
@gpderetta, I think Project Loom will need to become preemptive, otherwise the virtual threads can not be used as a drop-in replacement for native threads - we will have deadlocks in virtual threads where they don't happen in native threads.
Preemptive is safer for liveliness since it avoids 'starvation' (one task's poll taking too long), however it in practice almost always more expensive in memory and time due to the implicit state.
In async, only the values required to do a poll need to be held (often only references), while for threads the entire stack & registers needs to be stored at all times, since at any moment it could be interrupted and it will need to know where to continue from. And since it needs to save/overwrite all registers at each context switch (+ scheduler/kernel handling), it takes more time overall.
In general threads are a good option if you can afford the overhead, but assuming threads as a default can significantly hinder performance (or make near impossible to even run) where Rust needs to.
Java can afford that. M:N threads come with a heavy runtime. Java has already a heavy runtime, so what is a smidgen more flab?
Source: https://github.com/rust-lang/rfcs/blob/master/text/0230-remo...
So it seems that the biggest issue was having a single Io interface forcing overhead on both green and native threads and forcing runtime dispatching.
It seems to me that the best would have been to have the two libraries evolve separately and capture the common subset in a trait (possibly using dynamic impl when type erasure is tolerable), so that you can write generic code that can work with both or specialized code to take advantage of specific features.
As it stand now, sync and async are effectively separated anyway and it is currently impossible to write generic code that hande both.
Why not go one step further and invent "Parallel Rust"? And by parallel I mean it. Just a nice little keyword "parallel {}" where every statement inside the parallel block is executed in parallel, the same way it is done in HDLs. Rust's borrow checker should be able to ensure parallel code is safe. Of course one problem with this strategy is that we don't exactly have processors that are designed to spawn and process micro-threads. You would need to go back all the way to Sun's SPARC architecture for that and then extend it with the concept of a tree based stack so that multiple threads can share the same stack.
That would be a good step forward, I support it :)
BTW, do we need the `parallel` keyword, or better to simply let all code be parallel by default?
Haskell has entered the chat…
However, almost all of the most popular programming languages are imperative. I assume most programmers prefer to think of our programs as a series of steps which execute in sequence.
Mind you, arguably excel is the most popular programming language in use today, and it has exactly this execution model.
The rayon crate lets you do something quite similar.
I believe the answer is "that implies a runtime", and Rust as a whole is not willing to pull that up into a language requirement.
This is in contrast to Haskell, Go, dynamic scripting languages, and, frankly, nearly every other language on the market. Almost everything has a runtime nowadays, and while each individually may be fine they don't always play well together. It is important that as C rides into the sunset (optimistic and aspirational, sure, but I hope and believe also true) and C++ becomes an ever more complex choice to make for various reasons that we have a high-power very systems-oriented programming language that will make that choice, because someone needs to.
You do not need to spawn threads/tasks eagerly. You can do it lazily on work-stealing. See cilk++.
Doesn't rayon have a syntax like that?
There's also writing your code with poll() and select(), which is its own thing.
well that's the great thing with async rust: you write with poll and select without writing poll and select. let the computer and the compiler get this detail out of my way (seriously I don't want to do the fd interest list myself).
and I can still write conceptually similar select code using the, well, select! macro provided by most async runtimes to do the same on a select list of futures. better separation, easier to read, and overall it boils down to the same thing.
You mean like Haskell?
The answer is that you need an incredibly good compiler to make this behave adequately, and even then, every once in a while you'll get the wrong behavior and need to rewrite your code in a weird way.
IIRC withoutboats said in one of the posts that the true answer is compatibility with C.