HN comments for: libxev: A cross-platform, high-performance event loop

mitchellh

11 replies

2d23h

2024-04-17 18:40:08 UTC

This is my project.

I use it as the core cross-platform event loop layer for my terminal (https://mitchellh.com/ghostty). I still consider libxev an early, unstable project but the terminal has been in use by hundreds to now over a thousand beta testers daily for over a year now so at least for that use case its very stable. :) I know of others using it in production shipped software, but use it at your own risk.

As background, my terminal previously used libuv (the Node.js core event loop library), and I think libuv is a great project! I still have those Zig bindings available (archived) if anyone is interested: https://github.com/mitchellh/zig-libuv

The main issue I had personally with libuv was that I was noticing performance jitter due to heap allocations. libxev's main design goal was to be allocation-free, and it is. The caller is responsible for allocating all the memory libxev needs (however it decides to do that!) and passing it to libxev. There were some additional things I wanted: more direct access to mach ports on macOS, io_uring on Linux (although I think libuv can use io_uring now), etc. But more carefully controlling memory allocation was the big one.

And it worked! Under heavy IO load in my terminal project, p90 performance roughly matched libuv but my p99 performance was much, much better. Like, 10x or more better. I don't have those numbers in front of me anymore to back that up and my terminal project hasn't built with libuv in a very long time. But I consider the project a success for my use case.

You're probably better off using libuv (i.e. the Node loop, not my project) for your own project. But, the main takeaway I'd give people is: don't be afraid to reimplement this kind of stuff for you. A purpose-built event loop isn't that complicated, and if your software isn't even cross-platform, it's really not complicated.

samsquire

3 replies

2d23h

2024-04-17 19:23:46 UTC

Thank for you for sharing.

What do you think are the next steps for a next generation event loop?

I've been experimenting with barriers/phasers, LMAX Disruptors and my own lock free algorithms.

I think some form of multithreaded structured concurrency with coroutines and io_uring.

I've been experimenting with decoupling the making sending and recv independently parallel with multiple io_urings "split parallel io" - so you can process incoming traffic separately from the stream that generates data to send. Generating sends is unblocked by receive parsing and vice versa.

Interested in seastar and reactors.

password4321

2 replies

2d19h

2024-04-17 22:53:34 UTC

https://en.wikipedia.org/wiki/Data_Plane_Development_Kit

nextaccountic

0 replies

1d13h

2024-04-19 04:52:15 UTC

Is dpdk still needed after io_uring? io_uring can also do zero-copy packet processing

edit: there is this thesis https://liu.diva-portal.org/smash/get/diva2:1789103/FULLTEXT...

On 5.1.5 Summary of Benchmarking Results (page 44)

Of the three different applications and frameworks, DPDK performs best in all aspects con- cerning throughput, packet loss, packet rate, and latency. The fastest throughput of DPDK was measured at about 25 Gbit/s and the highest packet rate was measured at about 9 mil- lion. The packet loss for DPDK stays under 10% most of the time, but for packet sizes 64 bytes and 128 bytes, and for transmission rates of 32% and over, the packet loss reaches a maximum of 60%. Latency stays at around 12 μs for all sizes and transmission rates under 32% and reaches a maximum latency of 1 ms for packets of size 1518 bytes with transmission rates of 64% and above.

Based on these results, it was determined that DPDK can optimally handle transmission rates up to around 64 bytes, above rate 64% performance increases are non-existent while packet loss and latency increase.

io_uring had a maximum throughput of 5.0 Gbit/s and was achieved at a transmission rate of 16% or higher when the packet size was 1518 bytes. The packet loss was significant, especially for transmission rates over 16%, and when packet size was below 1280 bytes. Gen- erally, the packet loss decreased when packet sizes increased for all different transmission rates. The packet rate reached a maximum of approximately 460,000 packets per second. For higher transmission rates and for larger packet sizes, the packet rate decreased. This reached a minimum of around 40,000 packets per second for a transmission rate of 1%. The latency of io_uring is highest at size 1518 and transmission rate 100% with a latency of around 1.3 ms. For lower transmission rates under 64%, the latency decreases when packet size increase, reaching a minimum of around 20 to 30 μs.

The results of running io_uring at different transmission rates show that io_uring reaches its best performance on our system at around transmission rate 16%. Above rate 16% there are no improvements in performance and latency and packet loss increase.

Ok 25Gbps vs 5Gbps seems like a huge difference, specially since io_uring was having higher packet loss as well

jmakov

0 replies

1d20h

2024-04-18 21:56:55 UTC

Thank you for that. Would be interesting to see benchmarks.

password4321

2 replies

2d19h

2024-04-17 22:45:03 UTC

libxev's main design goal was to be allocation-free

Maybe "allocation-free" should be in the GitHub project description instead of or in addition to "high performance".

rgrmrts

1 replies

2d4h

2024-04-18 14:09:36 UTC

It is

Zero runtime allocations. This helps make runtime performance more predictable and makes libxev well suited for embedded environments.

password4321

0 replies

2d2h

2024-04-18 16:00:26 UTC

To be clear, I am discussing the text under "About" in the top right, labeled as "Description" when edited, which currently states:

libxev is a cross-platform, high-performance event loop that provides abstractions for non-blocking IO, timers, events, and more and works on Linux (io_uring or epoll), macOS (kqueue), and Wasm + WASI. Available as both a Zig and C API.

... with no mention of zero-allocation though yes it is mentioned later as a feature in the README.

mattgreenrocks

0 replies

2d21h

2024-04-17 20:35:52 UTC

Very nice! TBH, libuv sometimes felt like it is popular because it's popular rather than sheer technical prowess. I was never comfortable with how much allocation is done by it, and I don't always find how it deals with platform primitives as useful as I'd like.

don't be afraid to reimplement this kind of stuff for you. A purpose-built event loop isn't that complicated,

Amen. There's no need to view the event loop as mysterious. It's just a while loop that is constantly coordinating IO.

keepamovin

0 replies

2d19h

2024-04-17 23:10:08 UTC

Three, I wanted an event loop library that could build to WebAssembly (both WASI and freestanding) and that didn't really fit well into the goals of API style of existing libraries without bringing in something super heavy like Emscripten.

This is a cool motivation!

Could you drop this into Node to make Nodeex ? A kind of experimental allocation-free Node that somehow carves out the allocations into another layer (admittedly still within the node c code)?

ajoseps

0 replies

2d23h

2024-04-17 18:54:10 UTC

I saw ghostty and thought, “isn’t that the terminal written by the guy who cofounded hashicorp?”. I really enjoy your ghostty blog posts and will be checking out libxev!

adonese

0 replies

2d21h

2024-04-17 20:29:30 UTC

(Off topic) but any chance you might include me in the ghostty private testers? (adonese@nil.sd)

gigatexal

2 replies

2024-04-17 18:20:45 UTC

on MacOS are kqueue and libdispatch/grand central dispatch doing different things?

0x457

1 replies

2d21h

2024-04-17 20:50:00 UTC

libdispatch/GCD is a task scheduler built on top of kqueue. It's meant for moving things away from the main UI thread without thinking how often you do that.

gigatexal

0 replies

2d20h

2024-04-17 21:27:54 UTC

Thanks for clarifying!

Jarred

2 replies

2d22h

2024-04-17 20:01:40 UTC

We copied libxev's code for the timer heap implementation in Bun for setTimeout & setInterval, and it was a ~6x throughput improvement[0] on Linux compared to our previous implementation.

[0]: https://twitter.com/jarredsumner/status/1736741811039899871

hinkley

1 replies

2d21h

2024-04-17 20:37:54 UTC

Why do you suppose node is still 50% faster on this benchmark? v8 trickery or something in the library differences?

Jarred

0 replies

2d21h

2024-04-17 20:52:07 UTC

The timer heap currently lives on a different thread instead of the main thread, which means timers have to be allocated and scheduled separately for each one. Scheduling things to other threads is expensive. The reason it works this way isn't good and we will fix it but haven't prioritized it yet

eqvinox

1 replies

2d21h

2024-04-17 21:14:26 UTC

As someone maintaining a project with its own event loop: don't do it in larger projects.

The problem is that you'll start having dependencies on external libraries. And when those then need event loop integration, things get messy. We've introduced bugs before, caused by subtle differences in semantics. (Like: does write imply read? Are events disarmed while running? What about errors?)

If the lib and event loop are reasonably popular, someone else probably has integrated them before. Or the lib supports the event loop natively (or uses libverto.) Either saves you some trouble.

Also: please add libverto support for your event loop :) https://github.com/latchset/libverto

GoblinSlayer

0 replies

2d9h

2024-04-18 09:10:36 UTC

The interface looks like verto is linux first design, like git. But what's the point? Just implement epoll like Illumos did. Also allocation heavy and apparently can't use deno-style loop.

dsp_person

1 replies

2024-04-17 18:17:16 UTC

Can this be used to make something that feels like Qt's signals and slots?

jcelerier

0 replies

2d6h

2024-04-18 12:02:09 UTC

Many signal/slot implementations are done synchronously without any event loop involved, the two are somewhat orthogonal. Even Qt will call the signals synchronously most of the time without the event loop involved, it's just an additional feature of it to queue the event in the event loop.

jpgvm

0 replies

2024-04-17 18:00:31 UTC

Completion based cross-platform I/O? Sign me up.

jauntywundrkind

0 replies

2d22h

2024-04-17 19:46:40 UTC

io_uring support is obviously great & excellent, fulfills the "high performance" part well. brought an immediately smile to my face.

i was not expecting "Wasm + WASI" support at all. that's very cool. implementation is wasi_poll.zig (https://github.com/mitchellh/libxev/blob/main/src/backend/wa...). not to be unkind, but this makes me wonder very much if WASI is already missing the mark, if polling is the solution offered.

gotta say, this is some very understandable clean code. further enhancing my sense that i really ought be playing with zig.

hinkley

0 replies

2d21h

2024-04-17 20:35:24 UTC

I was going to say, "I wonder if Bun.js would/could use this" but it looks like Jarred Sumner has been cherry-picking bits of libxev for at least six months.